-4

I have a scenario where my data is something like below:

Chapter 18 Unit 10 Sect 16

  • Case 1 : I want to select Chapter 18 from the above string.
  • Case 2 : I want to select Unit 10 from the above string.
  • Case 3 : I want to select Sect 16 from the above string.
Alex Myers
  • 6,196
  • 7
  • 23
  • 39

4 Answers4

4

I would use REGEXP_SUBSTR (documentation), with right regular expressions. For example:

select regexp_substr('Chapter 18 Unit 10 Sect 16', 'Chapter \d*') from dual;
  --Will return: Chapter 18
select regexp_substr('Chapter 18 Unit 10 Sect 16', 'Unit \d*') from dual;
  --Will return: Unit 10
select regexp_substr('Chapter 18 Unit 10 Sect 16', 'Sect \d*') from dual;
  --Will return: Sect 16

Of course if you store Chapter xx Unit yy Sect zz strings in table, then you simply use this kind of query to get multiple results:

select regexp_substr(info_column, 'Chapter \d*') from mytable;

You can replace \d with [0-9] or [[:digit:]]

SQLfiddle

yamny
  • 660
  • 4
  • 13
  • 2
    No problem with regular expressions until you encounter issues with high CPU resource consumption. I would rather use old SUBSTR for such simple requirement. – Lalit Kumar B Sep 10 '14 at 14:52
  • Of course I would suggest normal `substr` if the values in the string in question would always be two-digit. – yamny Sep 10 '14 at 17:09
  • See my update. I have shown REGEXP is slower using zaratustra's test case. His test case was incorrect and when corrected it shows the new execution times to prove the fact. – Lalit Kumar B Sep 11 '14 at 09:21
2

I would use the old SUBSTR rather than REGEXP. Since REGEXP will take up too much CPU resource.

Your requirement is quite simple.

For data as Chapter 18 Unit 10 Sect 16, if you want Chapter 18 as output :

Just execute :

Select substr(column, 1, 10) from table

Similarly, you can do for other outputs.

Edit : Folks, some of you might think why am I stressing on good old SUBSTR over REGEXP. Just put trace on, and look at the trace output. I agree, with newer versions, Oracle has made a lot of improvement to REGEXP. But, till date I haven't seen a case where I got satisfied. I might be wrong, so, if anyone has a test case, I would really like to see it. It would a good learning for all of us.

Update to show the above test case about REGULAR EXPRESSION faster than SUBSTR + INSTR, is wrong!

REGULAR EXPRESSION :

SQL> DECLARE
  2      l_start NUMBER := dbms_utility.get_cpu_time;
  3  BEGIN
  4      FOR i IN (WITH t
  5                     AS (SELECT 'Chapter '
  6                                || LEVEL
  7                                || ' Unit '
  8                                || LEVEL
  9                                || ' Sect '
 10                                || LEVEL d
 11                         FROM   dual
 12                         CONNECT BY ROWNUM < 100000)
 13                SELECT Regexp_substr(d, 'Chapter [0-9]*') chapter,
 14                       Regexp_substr(d, 'Unit [0-9]*')    unit,
 15                       Regexp_substr(d, 'Sect [0-9]*')    sect
 16                 FROM   t) LOOP
 17          NULL;
 18      END LOOP;
 19
 20      dbms_output.Put_line('time taken by REGULAR EXPRESSION : '
 21                           || ( dbms_utility.get_cpu_time - l_start )
 22                           || ' hsec');
 23  END;
 24
 25  /
time taken by REGULAR EXPRESSION : 61 hsec

PL/SQL procedure successfully completed.

SUBSTR + INSTR :

SQL> DECLARE
  2      l_start NUMBER := dbms_utility.get_cpu_time;
  3  BEGIN
  4      FOR i IN (WITH t
  5                     AS (SELECT 'Chapter '
  6                                || LEVEL
  7                                || ' Unit '
  8                                || LEVEL
  9                                || ' Sect '
 10                                || LEVEL d
 11                         FROM   dual
 12                         CONNECT BY ROWNUM < 100000)
 13                SELECT Substr(d, 1, Instr(d, ' ', 1, 2) - 1)
 14                       chapter,
 15                       Substr(d, Instr(d, ' ', 1, 2),
 16                       Instr(d, ' ', 1, 4) - Instr(d,
 17                       ' ', 1, 2))
 18                       unit,
 19                       Substr(d, Instr(d, ' ', 1, 4), Length(d) - Instr(d, ' ', 1,
 20                                                                  4)
 21                                                      + 1)
 22                       sect
 23                 FROM   t) LOOP
 24          NULL;
 25      END LOOP;
 26
 27      dbms_output.Put_line('time taken by SUBSTR + INSTR : '
 28                           || ( dbms_utility.get_cpu_time - l_start )
 29                           || ' hsec');
 30  END;
 31
 32  /
time taken by SUBSTR + INSTR : 28 hsec

PL/SQL procedure successfully completed.

So, it can be clearly seen SUBSTR + INSTR took less than half the time that of REGULAR EXPRESSION.

Lalit Kumar B
  • 47,486
  • 13
  • 97
  • 124
  • 1
    As long as the numbers are always two digits, and the elements are always in that order. Not sure that's clear from the question. ("something like..." is always a bit of a concern!). Regex is more flexible, but obviously potentially at a cost. – Alex Poole Sep 10 '14 at 15:12
  • Take a look at my answer. REGEXP beats SUBSTR + INSTR dramatically. – neshkeev Sep 10 '14 at 15:13
  • @zaratustra, Sure, let me have a look and test it. – Lalit Kumar B Sep 10 '14 at 15:36
  • @Alex, I don't hate regular expressions. Just that I don't like their performance in batch processes or on OLTP systems. They are so much resource consuming. – Lalit Kumar B Sep 10 '14 at 15:39
0

My version, useful when the data length is NOT fixed.! This might be bit generic

Last argument to the regexp_substr actually returns the corresponding case result!

SELECT level as case ,
       regexp_substr('Chapter 180 Unit 10 Sect 16 World 100', '\w* \d*( )*',1,level) as result
FROM dual
  CONNECT BY level <= CEIL(regexp_count('Chapter 180 Unit 10 Sect 16 World 100',' ')/2)

Result:

      CASE RESULT
---------- ------------------------
         1 Chapter 180
         2 Unit 10
         3 Sect 16
         4 World 100

Fiddle Demo

Maheswaran Ravisankar
  • 17,652
  • 6
  • 47
  • 69
0

Using substr:

declare
  l_start number := DBMS_UTILITY.get_cpu_time;
begin
for i in (
with t as (
  select 'Chapter ' || level || ' Unit ' || level || ' Sect ' || level  d from dual connect by rownum < 100000
)
select substr(d, 1, instr(d, ' ', 1, 2) - 1) chapter
     , substr(d, 
          instr(d, ' ', 1, 2), 
          instr(d, ' ', 1, 4) - instr(d, ' ', 1, 2)
       ) unit
     , substr(d, 
          instr(d, ' ', 1, 4), 
          length(d) - instr(d, ' ', 1, 4) + 1
       ) sect 
  from t
)
loop
  null;
end loop;
 DBMS_OUTPUT.put_line((DBMS_UTILITY.get_cpu_time - l_start) || ' hsec');
end;

126 hsec

Using regexp:

declare
  l_start number := DBMS_UTILITY.get_cpu_time;
begin
for i in (
with t as (
  select 'Chapter ' || level || ' Unit ' || level || ' Sect ' || level  d from dual connect by rownum < 100000
)
select regexp_substr(d, 'Chapter [0-9]*') chapter
     , regexp_substr(d, 'Unit [0-9]*') unit
     , regexp_substr(d, 'Sect [0-9]*') sect 
  from t
)
loop
  null;
end loop;
 DBMS_OUTPUT.put_line((DBMS_UTILITY.get_cpu_time - l_start) || ' hsec');
end;

190 hsec

So the solution with regexp is slower, but it is more readable, if I were you I would use regexp.

neshkeev
  • 6,280
  • 3
  • 26
  • 47
  • That is completely wrong test case. Your REGEXP query doesn't use the column 'd', rather you have just hardcoded 'Chapter 18 Unit 10 Sect 16'. Just replace it with column d and see the difference. I will edit my post to add your test case to prove my statement that REGEXP will be slower and will consume more resources. – Lalit Kumar B Sep 11 '14 at 08:40
  • In the new update, perhaps you copy pasted the substr code in the regexp, so both are same codes. You might have to edit again. – Lalit Kumar B Sep 12 '14 at 03:48