How to select sub string in oracle?

Question

I have a scenario where my data is something like below:

Chapter 18 Unit 10 Sect 16

Case 1 : I want to select Chapter 18 from the above string.
Case 2 : I want to select Unit 10 from the above string.
Case 3 : I want to select Sect 16 from the above string.

are numbers constant? could it be `chapter 45 unit 145 sect 2`? How about case sensitive? — neshkeev, Sep 10 '14 at 14:50
What have you tried? There are many, many similar questions on stackoverflow. — Patrick Bacon, Sep 10 '14 at 15:19

yamny · Answer 1 · 2014-09-10T14:59:37.597

4

I would use REGEXP_SUBSTR (documentation), with right regular expressions. For example:

select regexp_substr('Chapter 18 Unit 10 Sect 16', 'Chapter \d*') from dual;
  --Will return: Chapter 18
select regexp_substr('Chapter 18 Unit 10 Sect 16', 'Unit \d*') from dual;
  --Will return: Unit 10
select regexp_substr('Chapter 18 Unit 10 Sect 16', 'Sect \d*') from dual;
  --Will return: Sect 16

Of course if you store Chapter xx Unit yy Sect zz strings in table, then you simply use this kind of query to get multiple results:

select regexp_substr(info_column, 'Chapter \d*') from mytable;

You can replace \d with [0-9] or [[:digit:]]

SQLfiddle

edited Sep 10 '14 at 14:59

answered Sep 10 '14 at 14:44

yamny

660
4
13

2

No problem with regular expressions until you encounter issues with high CPU resource consumption. I would rather use old SUBSTR for such simple requirement. – Lalit Kumar B Sep 10 '14 at 14:52
Of course I would suggest normal `substr` if the values in the string in question would always be two-digit. – yamny Sep 10 '14 at 17:09
See my update. I have shown REGEXP is slower using zaratustra's test case. His test case was incorrect and when corrected it shows the new execution times to prove the fact. – Lalit Kumar B Sep 11 '14 at 09:21

Lalit Kumar B · Answer 2 · 2014-09-11T08:53:35.523

I would use the old SUBSTR rather than REGEXP. Since REGEXP will take up too much CPU resource.

Your requirement is quite simple.

For data as Chapter 18 Unit 10 Sect 16, if you want Chapter 18 as output :

Just execute :

Select substr(column, 1, 10) from table

Similarly, you can do for other outputs.

Edit : Folks, some of you might think why am I stressing on good old SUBSTR over REGEXP. Just put trace on, and look at the trace output. I agree, with newer versions, Oracle has made a lot of improvement to REGEXP. But, till date I haven't seen a case where I got satisfied. I might be wrong, so, if anyone has a test case, I would really like to see it. It would a good learning for all of us.

Update to show the above test case about REGULAR EXPRESSION faster than SUBSTR + INSTR, is wrong!

REGULAR EXPRESSION :

SQL> DECLARE
  2      l_start NUMBER := dbms_utility.get_cpu_time;
  3  BEGIN
  4      FOR i IN (WITH t
  5                     AS (SELECT 'Chapter '
  6                                || LEVEL
  7                                || ' Unit '
  8                                || LEVEL
  9                                || ' Sect '
 10                                || LEVEL d
 11                         FROM   dual
 12                         CONNECT BY ROWNUM < 100000)
 13                SELECT Regexp_substr(d, 'Chapter [0-9]*') chapter,
 14                       Regexp_substr(d, 'Unit [0-9]*')    unit,
 15                       Regexp_substr(d, 'Sect [0-9]*')    sect
 16                 FROM   t) LOOP
 17          NULL;
 18      END LOOP;
 19
 20      dbms_output.Put_line('time taken by REGULAR EXPRESSION : '
 21                           || ( dbms_utility.get_cpu_time - l_start )
 22                           || ' hsec');
 23  END;
 24
 25  /
time taken by REGULAR EXPRESSION : 61 hsec

PL/SQL procedure successfully completed.

SUBSTR + INSTR :

SQL> DECLARE
  2      l_start NUMBER := dbms_utility.get_cpu_time;
  3  BEGIN
  4      FOR i IN (WITH t
  5                     AS (SELECT 'Chapter '
  6                                || LEVEL
  7                                || ' Unit '
  8                                || LEVEL
  9                                || ' Sect '
 10                                || LEVEL d
 11                         FROM   dual
 12                         CONNECT BY ROWNUM < 100000)
 13                SELECT Substr(d, 1, Instr(d, ' ', 1, 2) - 1)
 14                       chapter,
 15                       Substr(d, Instr(d, ' ', 1, 2),
 16                       Instr(d, ' ', 1, 4) - Instr(d,
 17                       ' ', 1, 2))
 18                       unit,
 19                       Substr(d, Instr(d, ' ', 1, 4), Length(d) - Instr(d, ' ', 1,
 20                                                                  4)
 21                                                      + 1)
 22                       sect
 23                 FROM   t) LOOP
 24          NULL;
 25      END LOOP;
 26
 27      dbms_output.Put_line('time taken by SUBSTR + INSTR : '
 28                           || ( dbms_utility.get_cpu_time - l_start )
 29                           || ' hsec');
 30  END;
 31
 32  /
time taken by SUBSTR + INSTR : 28 hsec

PL/SQL procedure successfully completed.

So, it can be clearly seen SUBSTR + INSTR took less than half the time that of REGULAR EXPRESSION.

As long as the numbers are always two digits, and the elements are always in that order. Not sure that's clear from the question. ("something like..." is always a bit of a concern!). Regex is more flexible, but obviously potentially at a cost. — Alex Poole, Sep 10 '14 at 15:12
Take a look at my answer. REGEXP beats SUBSTR + INSTR dramatically. — neshkeev, Sep 10 '14 at 15:13
@Alex, I don't hate regular expressions. Just that I don't like their performance in batch processes or on OLTP systems. They are so much resource consuming. — Lalit Kumar B, Sep 10 '14 at 15:39

Maheswaran Ravisankar · Answer 3 · 2014-09-10T15:00:05.343

0

My version, useful when the data length is NOT fixed.! This might be bit generic

Last argument to the regexp_substr actually returns the corresponding case result!

SELECT level as case ,
       regexp_substr('Chapter 180 Unit 10 Sect 16 World 100', '\w* \d*( )*',1,level) as result
FROM dual
  CONNECT BY level <= CEIL(regexp_count('Chapter 180 Unit 10 Sect 16 World 100',' ')/2)

Result:

      CASE RESULT
---------- ------------------------
         1 Chapter 180
         2 Unit 10
         3 Sect 16
         4 World 100

Fiddle Demo

edited Sep 10 '14 at 15:00

answered Sep 10 '14 at 14:54

Maheswaran Ravisankar

17,652
6
47
69

They may not be OLTP transactions friendly, but really flexible! – Maheswaran Ravisankar Sep 11 '14 at 10:07
Flexible, yes. Resource hungry, yes :-) – Lalit Kumar B Sep 11 '14 at 10:08
Yes Bro! usually these kind requirements could be in DW.. I use it often. Not online. And my oracle is hosted over `EXADATA` .. resource abuse, we never heard of from DBA.. we see only under usage :p – Maheswaran Ravisankar Sep 11 '14 at 10:11
1

Lucky guy to be on `EXADATA`, lol! Anyway, my only concern is that the answered marked as correct is demonstrating an incorrect test case. – Lalit Kumar B Sep 11 '14 at 10:18

neshkeev · Accepted Answer · 2014-09-12T12:07:15.397

Using substr:

declare
  l_start number := DBMS_UTILITY.get_cpu_time;
begin
for i in (
with t as (
  select 'Chapter ' || level || ' Unit ' || level || ' Sect ' || level  d from dual connect by rownum < 100000
)
select substr(d, 1, instr(d, ' ', 1, 2) - 1) chapter
     , substr(d, 
          instr(d, ' ', 1, 2), 
          instr(d, ' ', 1, 4) - instr(d, ' ', 1, 2)
       ) unit
     , substr(d, 
          instr(d, ' ', 1, 4), 
          length(d) - instr(d, ' ', 1, 4) + 1
       ) sect 
  from t
)
loop
  null;
end loop;
 DBMS_OUTPUT.put_line((DBMS_UTILITY.get_cpu_time - l_start) || ' hsec');
end;

126 hsec

Using regexp:

declare
  l_start number := DBMS_UTILITY.get_cpu_time;
begin
for i in (
with t as (
  select 'Chapter ' || level || ' Unit ' || level || ' Sect ' || level  d from dual connect by rownum < 100000
)
select regexp_substr(d, 'Chapter [0-9]*') chapter
     , regexp_substr(d, 'Unit [0-9]*') unit
     , regexp_substr(d, 'Sect [0-9]*') sect 
  from t
)
loop
  null;
end loop;
 DBMS_OUTPUT.put_line((DBMS_UTILITY.get_cpu_time - l_start) || ' hsec');
end;

190 hsec

So the solution with regexp is slower, but it is more readable, if I were you I would use regexp.

That is completely wrong test case. Your REGEXP query doesn't use the column 'd', rather you have just hardcoded 'Chapter 18 Unit 10 Sect 16'. Just replace it with column d and see the difference. I will edit my post to add your test case to prove my statement that REGEXP will be slower and will consume more resources. — Lalit Kumar B, Sep 11 '14 at 08:40
In the new update, perhaps you copy pasted the substr code in the regexp, so both are same codes. You might have to edit again. — Lalit Kumar B, Sep 12 '14 at 03:48

How to select sub string in oracle?

4 Answers4

Linked