 |
|
Parallel Queries and Distributed Tables
Oracle Tips by Burleson Consulting |
There is an alternative to parallelism for
Oracle distributed queries. In a distributed environment, Oracle
parallel query can be simulated when using Net8 to perform
simultaneous remote queries on each remote Net8 server.
Interestingly, double parallelism can be achieved if the remote
server invokes parallel full-table scans. While each query is
executing simultaneously on each remote server, each query can be
using Oracle parallel query.
These types of parallel queries are most
useful in distributed databases where a single logical table has
been partitioned into smaller tables at each remote node. This
approach is very common in ERP applications where each remote Net8
server requires autonomy, while the corporate office requires a
method to query each cloned table as if it were local. For example,
a customer table that is ordered by customer name may be partitioned
into a customer table at each remote database, such that we have a
new_york_employee table, a california_employee table,
and so on. This vertical table partitioning approach is very common
with distributed databases where local autonomy of processing is
important.
With the tables partitioned onto different
databases at different geographical locations, how can we meet the
needs of the corporate headquarters where a complete view is
required? How can they query all of these remote tables as a single
unit and treat the logical customer table as a single entity? For
large queries that may span many logical tables, the isolated tables
can then easily be reassembled to use Oracle's parallel query
facility:
create view
all_employee
as
select *
from
new_york_employee@manhattan
UNION ALL
select *
from
california_employee@los_angeles
UNION ALL
select *
from
japan_employee@tokyo;
We can now query the all_employee view
as if it were a single database table, and Oracle will automatically
recognize the union all clause and fire off simultaneous
queries against each of the three base tables. It is important to
note that the distributed database manager will direct that each
query be processed at the remote location, while the query manager
waits until each remote node has returned its result set. For
example, the following query will assemble the requested data from
the three tables in parallel, with each query being separately
optimized. The result set from each subquery is then merged by the
query manager process and delivered to the front-end application.
select
employee_name
from
all_employee
where
salary > 500000;
In a query like this, a large distributed
transaction will invoke parallel queries at each remote database.
Note that these parallel queries will not necessarily perform
full-table scans, but the remote data requests will be issued and
managed by the Oracle Distributed Recovery Manager (RECO) process
(Figure 10-4).
Figure 4: A
distributed parallel query
Next, let’s wrap up this chapter and cover
the major points to consider when evaluating and tuning full-table
scans.
Conclusion
This chapter has been concerned with
identifying full-table scans, evaluating the legitimacy of the
full-table scan, tuning full-table scans for optimal performance,
and monitoring full-table scan activity in your database. The main
points of this chapter include these:
-
Oracle will often perform a full-table scan even if a faster way
exists to access the table.
-
All full-table scans should be evaluated to see if the full-table
scan can be replaced by index access. Always question the
legitimacy of a full-table scan.
-
Large-table full-table scans will run far faster with Oracle
parallel query.
-
Small-table full-table scans will run faster by placing the table
blocks in the KEEP pool.
-
Turning on parallelism with the alter table command is
dangerous because it can change the behavior of the execution
plans for all queries that involve the table.
-
The degree of parallelism depends on the number of CPUs and the
distribution of a table across disks.
-
You can monitor the parallel query slave behavior by using the
v$pq_tqstat view. You should always make sure that the
parallel query slaves are accessing an equivalent number of rows.
-
You can monitor the historical instance-wide behavior of queries
by using STATSPACK. When a period of high parallel query activity
is identified, you can go to the STATSPACK stats$sql_summary
table and extract and evaluate the individual SQL statements.
-
Parallelism can be used to improve the speed of table joins,
including nested loop joins, hash joins, and sort merge joins.
-
Legitimate sort merge joins should always use parallel query
because a sort merge join requires full-table scans against both
tables.
Next, let’s move on and take a look at
optimizing sorting operations for Oracle SQL statements.
This is an excerpt from "Oracle High-Performance SQL Tuning" by
Donald K. Burleson, published by Oracle Press.