 |
|
Reporting vmstat on Other Oracle UNIX Servers Administration
Oracle UNIX/Linux Tips by Burleson Consulting |
Reporting vmstat Information on Other
Oracle Servers
To get a complete picture of the performance
of your total Oracle system, you must also monitor the behavior of all
of the servers that communicate with Oracle. For example, many Oracle
environments have other servers:
* Oracle Applications?In Oracle Applications
products, you generally have separate application servers
communicating with the database server.
* SAP with Oracle?In SAP, you have separate
application servers that communicate with Oracle.
* Real Application Clusters (Oracle Parallel
Server)?With RAC, you have multiple Oracle database servers, all
sharing the same database.
* Oracle Web Applications?When using Oracle
databases on the Web, you have separate WebServers that direct the
communications into the database.
This technique in get_vmstat.ksh can easily be
extended to measure the performance of other servers in your Oracle
environment. Note that the stats$vmstat table has a column to store
the server name. Since we can separate vmstat metrics by server, we
simply need to create a remote vmstat script that will capture the
performance of the other servers and send the data to a central
database. Because only the database server contains an Oracle
database, the vmstat data will be sent to the database from the remote
server using database links. Any server that has a Net8 client can be
used to capture vmstat information.
If we take a close look at the get_vmstat
script from above, we see that this script can be executed on a remote
server. The script will send the vmstat data to the server that
contains our Oracle database using a database link. Note where the
script enters sqlplus using “sqlplus perfstat/perfstat@prod”.
By collecting the data remotely, we can
capture a complete picture of the performance of all of the components
of the Oracle environment, not just the database server. This is
important in cases where you need to track slow performance of
ecommerce systems. Using this vmstat information, you can go back to
the time of the slowdown and see which Web servers may have been
overloaded and also examine the load on the database server.
Now that we see how to capture server
statistics into Oracle tables, we are ready to see how we can use this
valuable information to ensure that our server is not the cause of
Oracle performance problems. In the next section we will look at some
of the specific causes of server resource shortages and see techniques
that can be used to reduce demands on the CPU and RAM. We will also
explore some prewritten scripts that will automatically alert us to
exceptional server conditions.
Reporting on UNIX Server Statistics
Once the data is captured in the stats$vmstat
table, there is a wealth of reports that can be generated. Because all
of the server statistics exist inside a single Oracle table, it is
quite easy to write SQL*Plus queries to extract the data.
The vmstat data can be used to generate all
types of interesting reports. There are four classes of vmstat
reports:
* Exception reports?These reports show the
time period where predefined thresholds are exceeded.
* Daily trend reports?These reports are often
run and used with Excel spreadsheets to produce trending graphs.
* Hourly trend reports?These reports show the
average utilization, averaged by the hour of the day. These reports
are very useful for showing peak usage periods in a production
environment.
* Long-term predictive reports?These reports
generate a long-term trend line for performance. The data from these
reports is often used with a linear regression to predict when
additional RAM memory or CPU power is required for the server.
Let’s now examine the script that can be used
to generate these server reports and see how this information can help
us tune our Oracle database.
Server Exception Reports
The SQL script vmstat_alert.sql can quickly
give a complete exception report on all of the servers in our Oracle
environment. This report will display times when the CPU and RAM
memory exceed your predefined thresholds:
set lines 80;
set pages 999;
set feedback off;
set verify off;
column my_date
heading 'date hour' format a20
column c2 heading runq format 999
column c3 heading pg_in format 999
column c4 heading pg_ot format 999
column c5 heading usr format 999
column c6 heading sys format 999
column c7 heading idl format 999
column c8 heading wt format 999
ttitle 'run queue >
2|May indicate an overloaded CPU|When runqueue exceeds
he number of CPUs| on the server, tasks are waiting for service.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in) c3,
avg(page_out) c4,
avg(user_cpu) c5,
avg(system_cpu) c6,
avg(idle_cpu) c7
from
perfstat.stats$vmstat
WHERE
runque_waits > 2
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
ttitle 'page_in >
1|May indicate overloaded memory|Whenever Unix performs a page-in, the
RAM memory | on the server has been exhausted and swap pages are being
used.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in) c3,
avg(page_out) c4,
avg(user_cpu) c5,
avg(system_cpu) c6,
avg(idle_cpu) c7
from
perfstat.stats$vmstat
WHERE
page_in > 1
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
ttitle 'user+system
CPU > 70%|Indicates periods with a fully-loaded CPU
subssystem.|Periods of 100% utilization are only a | concern when
runqueue values exceeds the number of CPs on the server.';
select
server_name,
to_char(start_date,'YY/MM/DD HH24') my_date,
avg(runque_waits) c2,
avg(page_in) c3,
avg(page_out) c4,
avg(user_cpu) c5,
avg(system_cpu) c6,
avg(idle_cpu) c7
from
perfstat.stats$vmstat
WHERE
(user_cpu + system_cpu) > 70
and start_date > sysdate-&&1
group by
server_name,
to_char(start_date,'YY/MM/DD HH24')
ORDER BY
server_name,
to_char(start_date,'YY/MM/DD HH24')
;
The standard vmstat alert report is used to
alert the Oracle DBA and systems administrator to out-of-bounds
conditions on each Oracle server. These conditions include:
* CPU waits > 40% (AIX version only)?This may
indicate I/O-based contention. The solution is to spread files across
more disks or add buffer memory.
* Runqueue > xxx – (where xxx is the number of
CPUs on the server, 2 in this example)?This indicates an overloaded
CPU. The solution is to add additional processors to the server.
* Page_in > 2?Page-in operations indicate
overloaded memory. The solution is to reduce the size of the Oracle
SGA, PGA, or add additional RAM memory to the server.
* User CPU + System CPU > 90%?This indicates
periods where the CPU is highly utilized.
While the SQL here is self-explanatory, let’s
look at a sample report and see how it will help our systems
administrator monitor the server’s behavior:
SQL> @vmstat_alert 7
Wed Dec
20 page
1
run queue > 2
May indicate an overloaded CPU.
When runqueue exceeds the number of CPUs
on the server, tasks are waiting for service.
SERVER_NAME
date hour runq pg_in pg_ot usr sys idl
--------------- -------------------- ---- ----- ----- ---- ----
----
AD-01 00/12/13 17 3 0 0 87 5
8
Wed Dec
20 page
1
page_in > 1
May indicate overloaded memory.
Whenever Unix performs a page-in, the RAM memory
on the server has been exhausted and swap pages are being
used.
SERVER_NAME
date hour runq pg_in pg_ot usr sys idl
----------------- -------------------- ---- ----- ----- ---- ----
----
AD-01 00/12/13 16 0 5 0 1 1
98
AD-01 00/12/14 09 0 5 0 10 2
88
AD-01 00/12/15 16 0 6 0 0 0
100
AD-01 00/12/19 20 0 29 2 1 2
98
PROD1DB 00/12/13 14 0 3 43 4 4
93
PROD1DB 00/12/19 07 0 2 0 1 3
96
PROD1DB 00/12/19 11 0 3 0 1 3
96
PROD1DB 00/12/19 12 0 6 0 1 3
96
PROD1DB 00/12/19 16 0 3 0 1 3
96
PROD1DB 00/12/19 17 0 47 68 5 5
91
Wed Dec
20 page
1
user+system > 70%
Indicates periods with a fully-loaded CPU sub-system.
Periods of 100% utilization are only a
concern when runqueue values exceeds the number of CPUs on the
server.
SERVER_NAME
date hour runq pg_in pg_ot usr sys idl
----------------- -------------------- ---- ----- ----- ---- ----
----
AD-01 00/12/13 14 0 0 2 75 2
22
AD-01 00/12/13 17 3 0 0 87 5
8
AD-01 00/12/15 15 0 0 0 50 29
22
AD-01 00/12/15 16 0 0 0 48 33
20
AD-01 00/12/19 07 0 0 0 77 4
19
AD-01 00/12/19 10 0 0 0 70 5
24
AD-01 00/12/19 11 1 0 0 60 17
24
PROD1 00/12/19 12 0 0 1 52 30
18
PROD1 00/12/19 13 0 0 0 39 59
2
PROD1 00/12/19 14 0 0 0 39 55
6
PROD1 00/12/19 15 1 0 0 57 23
20
You may notice that this exception report
gives the hourly average for the vmstat information. If you look at
the get_vmstat.ksh script, you will see that the data is captured in
intervals of every 300 elapsed seconds (5-minute intervals). Hence, if
you see an hour where your server is undergoing stress, you can modify
your script to show the vmstat changes every five minutes. You can
also run this report in conjunction with other STATSPACK reports to
identify what tasks may have precipitated the server problem. The
stats$sql_summary table is especially useful for this purpose.
 |
If you like Oracle tuning, see the
book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning
tips and scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |
 |
Expert Remote DBA
BC is America's oldest and largest Remote DBA Oracle support
provider. Get real Remote DBA experts, call
BC Remote DBA today. |
 |
|