 |
|
11g Regular Expressions Enhancements
Oracle 11g New Features Tips by Burleson
Consulting
June 27, 2008 |
Oracle 11g SQL New Features Tips
The use of Regular Expressions for pattern
matching was introduced in Oracle10g. Oracle 10g included four
expression matching functions:
-
regexp_like Returns true if the pattern is
matched; otherwise it is false.
-
regexp_instr Returns the position of the start
or end of the matching string. Returns zero if not the
pattern, and does not match.
-
regexp_replace Returns a string where each matching
string is replaced with the text specified.
-
regexp_substr Returns the matching string, or NULL if no
match is found.
Oracle 11g adds functionality to use sub
patterns in regex_instr
and regexp_substr.
The new release also adds a new function called
regexp_count:
All five functions can be used in SQL statements
or PL/SQL. They operate on the database character datatypes to
include VARCHAR2, CHAR, CLOB, NVARCHAR2, NCHAR, and NCLOB.
regexp_instr
Syntax:
regexp_instr(source, pattern, position, occurrence, begin_end,
options, subexp)
The source of
regexp_instr can be a string literal, variable, or column,
and the pattern is the expression to be replaced. The optional
position is the location to begin the search, which defaults to 1.
Occurrence defines the occurrence you are looking for, and
begin_end defines whether
the position is wanted in the beginning or end of the occurrence.
This defaults to 0 when the beginning of the occurrence is wanted
for the position. One the other hand, to get the end position,
one is used. The matching options are:
New to 11g is the subexpression parameter.
The subexpression parameter identifies the sub expression of the
pattern that is the actual match. Furthermore, this parameter
is an integer that identifies the sub expression of the pattern.
If subexpression is not included or defined as 0, the entire pattern
is used. If a subexpression is used, but the pattern does not
have that subexpression, the function returns a zero. It
is important to note that if the subexpression is defined as NULL,
the function returns NULL.
select
regexp_instr('We are driving south by south east','south')
from dual;
16
select
regexp_instr('We are driving south by south east',
'south', 1, 2, 1)
from dual;
30
Here is a look at the sub expression, first
using the entire pattern.
select
regexp_instr('We are driving south by south east',
'((sou)(th))', 1, 2, 1,null,0)
from dual;
30
Next, the first subexpression, which in this
instance is the pattern, needs to be used
select
regexp_instr('We are driving south by south east',
'((sou)(th))', 1, 2, 1,null,1)
from dual;
30
At this point, the query can be executed using
the second subexpression. As can be seen, the answer changed.
select
regexp_instr('We are driving south by south east',
'((sou)(th))', 1, 2, 1,null,2)
from dual;
28
Finally, the function below will always return a
NULL since the subexpression is defined as NULL.
select
regexp_instr('We are driving south by south east',
'((sou)(th))', 1, 2, 1,null,null)
from dual;
Nothing was returned => NULL.
regexp_substr
Syntax:
regexp_substr(source, pattern, position, occurrence, options, subexp)
The source for
regexp_substr can be a
string literal, variable, or column, and the pattern is the
expression to be replaced. Again, the optional position is the
location to begin the search, and defaults to 1. Optional
occurrence define the occurrence that is being sought. The
matching options are the same. Again, as in
regexp_instr, 11g adds the
ability to use a subexpression for the actual match.
select
regexp_substr('We are driving south by south east',
'((sou)(th))',1,1,null,0)
from dual;
south
Now the same query using the second
subexpression.
select
regexp_substr('We are driving south by south east',
'((sou)(th))',1,1,null,2)
from dual;
south
regexp_count
Syntax:
regexp_count(source, pattern, position, options)
As with the other REGEXP functions, in
regexp_count the source
can be a string literal, variable, or column. The pattern is
the regular expression, and the position indicates the character in
the source that the search begins. The default is 1, or the
first character. The matching options are the same as the
other REGEXP functions.
select
regexp_count('We are driving south by south east', 'south')
from dual;
2
Using what Oracle refers to as the
Perl-influenced patterns, the number of words can be counted.
select
regexp_count('We are driving south by south east', '\w+')
from dual;
7
It is important to mention that the
regexp_count function
ignores subexpression.
As has been shown, using Regular Expressions can
add a powerful pattern-matching capability to the SQL and PL/SQL
toolbox. However, Regular Expression should not be used when a
simple LIKE clause will work. This is because there is much
more overhead to matching a pattern using Regular Expression, as
opposed to character wild cards. If all rows where
last_name starts with
‘GAR’ is being sought, the LIKE clause (where xxx like ‘GAR%’) will
perform better than using Regular Expression. But as searches
become more complicated, pattern matching with Regular Expression
can be very powerful.
 |
This is an
excerpt from the new book
Oracle 11g New Features: Expert Guide to the Important
New Features by John Garmany, Steve Karam, Lutz Hartmann, V. J.
Jain, Brian Carr.
You can buy it direct from the publisher
for 30% off. |