 |
|
Sort Merge Join
Oracle Tips by Burleson Consulting |
A sort merge join is an operation that
retrieves two set of rows from the target tables and then sorts each
row set by the join column. The merge phase combines each row from
one set with the matching rows from the other, returning the
intersection of the row sets (see Figure 3-9).
Figure 9: A SQL sort
merge join
There are several permutations of the sort
merge join, an outer merge join, an anti-merge join, and a
semi-merge join. These permutations alter the merge phase from the
goal of finding the intersections of the row sets to finding only
those rows that meet the conditions of the SQL where clause.
STAR Query Join
With this new join method, data warehouse
queries can run at blistering speeds, in some cases dozens of times
faster than the original query. The STAR schema design involves
creating a main fact table that contains all of the primary keys in
the related data warehouse tables. This massive denormalization of
the database structure means that just about any query against the
STAR schema is going to involve the joining of many large tables,
including a large fact table and many smaller reference tables. To
invoke the STAR query path, the following characteristics must be
present:
-
There must be at least three tables being
joined, with one large fact table and several smaller dimension
tables.
-
There must be a concatenated index on the
fact table with at least three columns, one for each of the table
join keys.
-
You must verify with an explain plan that the
NESTED LOOPS operation is being used to perform the join.
Essentially,
Oracle follows a simple procedure for processing STAR queries.
Oracle will first service the queries against the smaller dimension
tables, combining the result set into a Cartesian product table that
is held in Oracle memory. This virtual table will contain all of the
columns from all of the participating dimension tables. The primary
key for this virtual table will be a composite of all of the keys
for the dimension tables. If this key matches the composite index on
the fact table, then the query will be able to process very quickly.
Once the sum
of the reference tables has been addressed, Oracle will perform a
nested loop join of the intermediate table against the fact table.
This approach is far faster than the traditional method of joining
the smallest reference table against the fact table and then joining
each of the other reference tables against the intermediate table.
The speed is a result of reducing the physical I/O. The indexes are
read to gather the virtual table in memory, and the fact table will
not be accessed until the virtual index has everything it requires
to go directly to the requested rows via the composite index on the
fact table.
Connect by
Join
A connect by join is an operation for
the retrieval of rows in a hierarchical order for a SQL query that
contains a connect by clause. The connect by clause is
seldom used in most SQL statements. To see an example of the
connect by clause, please see the plan.sql code that we
use in this book to display SQL execution plans.
A Comparisons of Join Speeds
The reason that Oracle offers several methods
for joining tables is their experience that the nature of the tables
greatly impacts the efficiency of the joining. By far, the three
most popular join methods are the nested loop join, the sort merge
join, and the hash join, in that order.
The sort merge join and the hash join are
very similar in terms of function, while the nested loop join is
quite different from the other join techniques. Table 3-1 summarizes
the relative advantages of each join technique.
Join Type |
Advantages |
Sorting |
Large Row Subsets |
Index Required? |
Sort merge join
Hash join |
Faster throughput |
Yes |
Fast |
No |
Nested loop
join |
Faster
response time |
No |
Slow |
Yes |
Table 1: Relative
Advantages of Join Types
In sum, the nested loop join provides a
faster response time for SQL queries that return small row subsets
from each table. For large row subsets, the sort merge and hash join
may result in faster execution time. Again, we will return to these
join methods in detail in Chapter 16.
Now that you have a basic understanding of
the Oracle join methods, let’s explore the methods used by Oracle to
sort a result set.
This is an excerpt from "Oracle High-Performance SQL Tuning" by
Donald K. Burleson, published by Oracle Press.