Oracle DBA Forum  Remote DBA SQL Server Remote MSSQL Consulting

 Disk Contention Oracle Tips by Burleson Consulting

Once DBWR has been tuned, the Remote DBA needs to look at disk contention. Disk contention happens when one or more users attempt to read the same disk at the same time, or in some cases, access a different disk through the same controller path at the same time. Spreading Oracle-related files across several platters or sets of platters--the more the better--prevents this. The new RAID options don’t relieve the Remote DBA of file placement concerns. You should be sure that the RAID volumes are properly set. I had one system where a system administrator set up multiple RAID5 volumes using two disks for each volume (a hint: the 5 is a meaningful number for RAID5).

The report in Source 13.17 can be used to monitor relative-fill I/O efficiency--essentially, how many reads are being performed per request as a percent. The more times the process has to access the same datafile to get the same information, the less efficient the datafile is. This could be caused by co-location of indexes and tables, a poorly ordered table that is scanned frequently, or having temporary or undo segments in with data or index segments.

SOURCE 13.17 File efficiency report.

REM
REM NAME     :FILE_EFF.SQL
REM PURPOSE      :GENERATE FILE IO EFFICIENCIES REPORT
REM USE          :FROM STATUS_REPORTS.COM
REM Limitations :MUST BE RUN FROM ORACLE Remote DBA ACCOUNT
REM Revisions:
REM Date          Modified By     Reason For change
REM 10-JUL-1992     M. AULT     INITIAL CREATE
REM 23-Jun-1997            M.Ault     kcffio went away, rewrote to use
REM               existing views/tables
SET PAGES 58 NEWPAGE 0
SET LINES 131
COLUMN eff     FORMAT A6           HEADING '% Eff'
COLUMN ts      FORMAT A22           HEADING 'Tablespace Name'
COLUMN name FORMAT A40           HEADING ‘File Name’
START title132 "FILE IO EFFICIENCY"
BREAK ON ts
DEFINE OUTPUT = 'rep_out/&db/file_io.lis'
SPOOL &OUTPUT
SELECT
f.tablespace_name ts,
f.file_name name,
TO_CHAR(DECODE(v.phyblkrd,0,null,
ROUND(100*(v.phyrds+v.phywrts)/(v.phyblkrd+v.phyblkwrt),2))) eff
FROM Remote DBA_data_files f, v$filestat v WHERE f.file_id=v.file# ORDER BY 1,file#; SPOOL OFF PAUSE Press return to continue This is a cumulative report that gives information based on I/O since the Oracle instance was started. The report generated will list physical block reads and efficiency level (the efficiency number measures the percent of time Oracle asked for and got the right block the first time, which is a function of type of table scan and indexing). An example report is shown in Listing 13.18. LISTING 13.18 File I/O efficiency report. Date: 11/09/01 Page: 1 Time: 06:25 PM FILE IO EFFICIENCY SYS aultdb1 database Tablespace Phys Block Disk name File Name read/writes % Eff ----- ----------- -------------------------------------- ----------- ----- C:\O SYSTEM C:\ORACLE\ORADATA\AULTDB1\SYSTEM01.DBF 3,260 63.74 RBS C:\ORACLE\ORADATA\AULTDB1\RBS01.DBF 1,351 100 USERS C:\ORACLE\ORADATA\AULTDB1\USERS01.DBF 40 100 TEMP C:\ORACLE\ORADATA\AULTDB1\TEMP01.DBF 4,252 85.25 TOOLS C:\ORACLE\ORADATA\AULTDB1\TOOLS01.DBF 891 78.9 INDX C:\ORACLE\ORADATA\AULTDB1\INDX01.DBF 40 100 DRSYS C:\ORACLE\ORADATA\AULTDB1\DR01.DBF 40 100 PERFSTAT C:\ORACLE\ORADATA\AULTDB1\PERFSTAT.DBF 1,206 100 TEST_2K C:\ORACLE\ORADATA\AULTDB1\TEST_2K.DBF 42 100 ***** ----------- sum 11,122 9 rows selected. Points of interest in Listing 13.34 are: · In general, the relatively low efficiency of the SYSTEM and TOOLS areas. This is due to indexes and tables being mixed together in the SYSTEM and TOOLS tablespaces. A classic example, on Oracle’s part, of “Do what we say, not what we do.” · Rollback efficiency should always be 100 percent; if not, someone is using the rollback area for tables. · Index tablespace should always show high efficiencies; if they don't, then either the indexes are bad or someone is using the index areas for normal tables. · An attempt should be made to even-out I/O. In the above example, too much I/O is being done on C:\; some of these data files should be spread to other disks. · This report shows total I/O for the time frame beginning with the Oracle system startup. The results could be stored for two or more dates and times and then subtracted to show the disk I/O for a selected period of time. STATSPACK should be used for this type of measurement. Running this report before and after doing an application test run will give you an idea of the disk I/O profile for the application. This profile, combined with information concerning the maximum I/O supported by each disk or each controller, can help the Remote DBA determine how best to split out the application's files between disks. In a RAID situation, where tracking disk I/O can be problematic, I suggest using one of the tools from Quest or Precise that allow you to track I/O down to the spindle, even in RAID configurations. Tuning to Prevent Contention Contention occurs when a number of users attempt to access the same resource at the same time. This can occur for any database object but is most noticeable when the contention is for rollback segments, redo logs, latches, or locks. You may also experience contention during the processes involved with the multithreaded server. To correct contention, you must first realize that it is occurring. The procedure called in the script shown in Source 13.18 can be used to monitor for contention. The procedure called is a part of the Remote DBA_UTILITIES package, described later in this chapter, in section 13.8, “Using the Remote DBA_UTILITIES Package.” (Note, this package is over 1,200 lines long; if you want to look at it, download it from the Wiley Web site.) The report generated by this script is shown in Listing 13.19. SOURCE 13.18 The RUNNING_STATS calling script. REM REM NAME : DO_CALSTAT.SQL REM FUNCTION :Generate calculated statisitics report using REM FUNCTION :just_statistics procedure REM USE :FROM STATUS.SQL or SQLPLUS REM Limitations : REM Revisions: REM Date Modified By Reason For change REM 05-MAY-1992 Mike Ault Initial Creation REM 23-JUN-1997 Mike Ault Updated to V8 REM SET PAGES 58 NEWPAGE 0 EXECUTE Remote DBA_utilities.running_stats(TRUE); START title80 "CALCULATED STATISTICS REPORT" DEFINE output = rep_out\&db\cal_stat.lis SPOOL &output SELECT * FROM Remote DBA_temp; SPOOL OFF The Remote DBA_UTILITIES package called in Source 13.18 retrieves contention and database health-related statistics, then calculates other statistics based upon those it retrieves. LISTING 13.19 Sample output from running stats listing. Date: 11/07/01 Page: 1 Time: 03:35 PM CALCULATED STATISTICS REPORT Remote DBAUTIL aultdb1 database NAME VALUE -------------------------------- ---------- Startup Date: 07-nov-01 15:22:57 0 CUMMULATIVE HIT RATIO .982755488 sorts (memory) 4891 sorts (disk) 6 redo log space requests 1 redo log space wait time 25 Rollback Wait % 0 Shared Pool Available 56.4824371 Shared SQL% 44.0445754 Shared Pool Used 31.4790573 Data Dictionary Miss Percent 9.64587019 Library Reload % .039244203 table fetch by rowid 14930 table scans (long tables) 85 table scans (short tables) 694 table fetch continued row 383 Non-Index Lookups Ratio .109677419 RECURSIVE CALLS PER USER 35.1794083 SYS UNDO HDR WAIT CONTENTION 0 SYS UNDO BLK WAIT CONTENTION 0 UNDO BLK WAIT CONTENTION 0 UNDO HDR WAIT CONTENTION 0 Free List Contention Ratio 0 library cache .000266132 cache buffers chains .001655418 cache buffers lru chain .009571759 redo writing .038273117 redo allocation .012726424 FUNCTION 6 LIBRARY 16 TRIGGER 22 TABLE 426 SYNONYM 13226 SEQUENCE 59 PROCEDURE 3 PACKAGE BODY 125 PACKAGE 140 OPERATOR 21 LOB 9 VIEW 165 TYPE BODY 23 TYPE 75 JAVA RESOURCE 16 INDEX 478 JAVA CLASS 847 INDEXTYPE 7 TOTAL ALLOCATED MEG 1106.5 TOTAL USED MEG 520.320313 TOTAL SGA SIZE 260407308 Press enter to continue Let’s examine the various statistics gathered by the do_calst2.sql report and see what each means in respect to the database. The first value reported is just the startup time for the instance. Since many of the following statistics are cumulative from the time the instance was started, it was deemed necessary to have a timestamp showing the startup date. Startup Date: 07-nov-01 15:22:57 0 The next value, the cumulative hit ratio, is used to get a feel for performance over time. The cumulative hit ratio is the “average” hit ratio since the database was started, thus it may be low if taken too soon after startup. Usually, you should look for the cumulative hit ratio to be in the range of 0.85 to 1.0 for an OLTP system and 0.6 to 1.0 for a batch system. Generally speaking, if this value is below the applicable range, an increase in the number of db_block_buffers is indicated. CUMMULATIVE HIT RATIO .982755488 The next two statistics deal with how the database is performing sorts. Essentially, you want to minimize disk sorts. Disk sorts are reduced by increasing the size of the sort_area_size initialization parameter. If disk sorts exceed 1 to 2percent of total sorts, tune your sort areas. If disk sorts are required, look into using direct writes to speed sort processing to disks. sorts (memory) 4891 sorts (disk) 6 The next two statistics deal directly with redo log contention. If redo log space waits become excessive (into the hundreds), and wait time is excessive, consider tuning log_buffers and possibly resizing or increasing the number of redo logs. Actually, you would increase the number of redo log groups or the size of individual redo log members. All redo logs should be of equal size. It is suggested that the redo logs be sized such that loss of the online redo log will lose only X amount of data, where X is the critical amount of data for your application (say, an hour’s worth, day’s worth, ten minutes’ worth, etc.). redo log space requests 25 redo log space wait time 0 The next statistic, “Rollback Wait %,” tells how often a process had to wait for a rollback segment. If this value gets near 1.0, consider rollback segment tuning. Rollback Wait % 0 The next set of statistics deal with shared-pool health. The “Shared SQL %” statistic shows the ratio of total SQL areas in the library cache that can’t be reused against the total number of SQL areas (by memory). If more than 40 to 50 percent of your SQL is not reusable, and your used area is greater than 40 to 60 meg, you may suffer performance problems due to shared-pool thrashing. Shared-pool thrashing can be reduced by automated flushing or by reducing the shared-pool size, depending on the amount of non-reusable SQL. Shared Pool Available 56.4824371 Shared SQL% 44.0445754 Shared Pool Used 31.4790573 The next statistic, “Data Dictionary Miss Percent,” shows the miss percents for attempts to read values from the dictionary caches and fails. Dictionary cache misses can be quite expensive, so if this value exceeds 10 percent, you probably need to increase the size of your shared pool, because the individual dictionary caches haven’t been tunable since Oracle 6 (unless you count undocumented parameters). Data Dictionary Miss Percent 9.64587019 The next value, “Library Reload %,” shows how often your library cache has to reload objects that have been aged (or flushed) out of the shared pool. If this value nears 1 percent, look at resizing the shared pool in most systems. In systems where the percentage of reusable SQL is low, this value may be high due to shared pool thrashing. Library Reload % .039244203 The next values deal with table fetch efficiencies: “table fetch by rowid” shows how many rows were fetched due to direct statement of rowid in the select or via index lookups resulting in rowid fetches. This value should be used in concert with the “table fetch continued rows” statistic, to determine if your system is experiencing a high level of row/block chaining. If the ratio between the two values (table fetch continued rows/table fetch by rowid) times100 reaches whole percentage points, examine frequently searched tables containing VARCHAR2 or numeric values that are frequently updated. Columns with LOB, LONG, or LONG RAW values can also result in chaining if their length exceeds the db_block_size. The table scans (long tables) tell how many long full table scans were performed. Ideally, table scans (long tables) should be as small as possible; I try to keep it in a 10:1 ratio with table scans (short tables). The table scans (short_tables) are scans of tables that are less than 2 percent of the value DB_CACHE_SIZE; or, for pre-Oracle9i databases, the product of DB_BLOCK_SIZE and DB_BLOCK_BUFFERS. table fetch by rowid 14930 table scans (long tables) 85 table scans (short tables) 694 table fetch continued row 383 The next statistic, “Non-Index Lookups Ratio,” is a ratio between long table scans and short table scans, and is based on the assumption that a short table scan will most likely be an index scan. The short table versus long table scan is based on a short table being less than 10 percent of the total of db_block_buffers times db_block_size on most systems. The undocumented initialization parameter, “_small_table_threshold” will override this 10 percent rule if it is set for your system. Generally speaking, this value should always be much less than 1.0. If this value even approaches 0.1, look at the number of full table scans being performed on your system. It is suggested that this number be trended so that you know what is normal for your system and can take action if it changes. For a well-tuned system, this value should remain relatively stable. Non-Index Lookups Ratio .109677419 The next statistic, “RECURSIVE CALLS PER USER,” shows how many times, on average, a user process had to repeat a request for information. These repeat calls can be due to dynamic object extension, reparse of SQL statements, or several other recursive database actions. Generally speaking, you want this number as small as possible. On well-tuned systems, I have seen this as low as 7 to 10 per user or lower. RECURSIVE CALLS PER USER 35.1794083 The next four statistics deal with UNDO (rollback) segment contention. Usually these numbers will be at or near 0 for a well-tuned database. Generally, any value approaching 1 indicates the need for rollback segment resizing/tuning. SYS UNDO HDR WAIT CONTENTION 0 SYS UNDO BLK WAIT CONTENTION 0 UNDO BLK WAIT CONTENTION 0 UNDO HDR WAIT CONTENTION 0 The next statistic, “Free List Contention Ratio,” is applicable to Oracle Parallel Server only. If the ratio reaches whole percentages, then look at major tables and rebuild them with better freelist and freelist group structure. This is applicable only if you have freelist groups defined in OPS or RAC, or in Oracle8i or Oracle9i with freelist groups configured in a normal database. Free List Contention Ratio 0 The next set of statistics varies in number from 0 (none shown) to as many as there are latch types in the database. If there is any latch contention for a specific latch, it will be shown in this listing. Generally, these will be in the .001 or less range; if they get into the 0.1 to .01 range, consider tuning the latch that is showing the large contention value, if possible. It is normal for redo copy or redo allocation latches to show contention on RS6000 or single CPU machines. The parallel query latches may also show contention (even Oracle Support can’t tell why; surprise, surprise). library cache .000266132 cache buffers chains .001655418 cache buffers lru chain .009571759 redo writing .038273117 redo allocation .012726424 The next set of statistics give general database object counts. You should be cognizant of the usual values and be aware if any get out of normal bounds. FUNCTION 6 LIBRARY 16 TRIGGER 22 TABLE 426 SYNONYM 13226 SEQUENCE 59 PROCEDURE 3 PACKAGE BODY 125 PACKAGE 140 OPERATOR 21 LOB 9 VIEW 165 TYPE BODY 23 TYPE 75 JAVA RESOURCE 16 INDEX 478 JAVA CLASS 847 INDEXTYPE 7 The next two statistics deal with how allocated filespace is being utilized by the database; they measure the total sum of datafile sizes against the total size of all allocated extents in the database. If you see that your used space is exceeding 90 percent of the total space allocated, then look at a more detailed report of space usage (which follows later in the report list) to see which tablespaces may need additional files allocated. TOTAL ALLOCATED MEG 1106.5 TOTAL USED MEG 520.320313 The final statistic, “TOTAL SGA SIZE,” calculates a total size of the SGA based on the V$SGA view. This is strictly an informational statistic.

TOTAL SGA SIZE                    260407308

There may be an additional set of statistics that deal with database block contention. The “waits” statistics tell how many waits occurred for a given block type, and the “time” statistics tell how long the waits took for each type of block. The data statistics deal with database blocks; the undo statistics deal with rollback segment blocks; and if you get system statistics, they deal with the system rollback segment. If data block waits or data block header waits exceed 100, then look at increasing the number of data base block buffers (note that hit ratio can be satisfactory even with significant contention, so both statistics should be used to determine database block buffer health). If undo block waits are indicated, increase the size of the rollback segment extents; if undo header waits are indicated, increase the number of rollback segment extents. If system block or system header waits are indicated, there may be users other than SYS or SYSTEM assigned to the SYSTEM tablespace.

data block waits                       396

data block time                        324

 This is an excerpt from Mike Ault, bestselling author of "Oracle 10g Grid and Real Application Clusters". You can buy it direct from the publisher for 30%-off and get instant access to the code depot of Oracle tuning scripts.

# Expert Remote DBA

BC is America's oldest and largest Remote DBA Oracle support provider.  Get real Remote DBA experts, call
BC Remote DBA today.