2

I want to divide a semicolon separated string into its parts with PL/SQL. It’s working fine with REGEXP_SUBSTR as long as there’s no round bracket in the string.

Example:

select REGEXP_SUBSTR('A;B;C','[^(";")]+',1,1),
REGEXP_SUBSTR('A;B;C','[^(";")]+',1,2),
REGEXP_SUBSTR('A;B;C','[^(";")]+',1,3)
from dual;

Result as expected is: A B C

The result for A;B(1);C should be A B(1) C but what I get is : A B 1

select REGEXP_SUBSTR('A;B(1);C','[^(";")]+',1,1),
REGEXP_SUBSTR('A;B(1);C','[^(";")]+',1,2),
REGEXP_SUBSTR('A;B(1);C','[^(";")]+',1,3)
from dual;

That means '(' is detected as delimiter, but I do not understand this behavior. Can someone please enlighten me?

2 Answers2

4

[] is a Multilingual Regular Expression Syntax which says "Bracket expression for specifying a matching list that should match any one of the expressions represented in the list. A nonmatching list expression begins with a circumflex (^) and specifies a list that matches any character except for the expressions represented in the list."

For example

select REGEXP_SUBSTR('"A";"B(1)";"C"','[^";"]+',1,1)
from dual;

will return A B(1) C where [^";"]+ considers " OR ; as separator

Similarly in your example [^(";")]+ considers " OR ; OR ( OR ) as separators against your expectation.

So for your expected output, you can try

select REGEXP_SUBSTR('A;B(1);C','[^;]+',1,1),
REGEXP_SUBSTR('A;B(1);C','[^;]+',1,2),
REGEXP_SUBSTR('A;B(1);C','[^;]+',1,3)
from dual;
Srini V
  • 11,045
  • 14
  • 66
  • 89
1

Once again I climb on my soapbox to warn folks about the dangers of using the regex of the format '[^;]+' to parse delimited strings. Repent and be saved! It does not handle NULL elements and WILL return unexpected results. See here for more info and proof. Please use this format instead and sleep easy knowing your output is accurate:

Note the 2nd element is (NULL)

SQL> with tbl(str) as (
      select 'A;;B(1);C' from dual
    )
    select regexp_substr(str, '(.*?)(;|$)', 1, level, NULL, 1)
    from tbl
    connect by level <= regexp_count(str, ';') + 1;

REGEXP_SU
---------
A

B(1)
C

SQL>

Note the NULL returned for element 2, as expected. If you use the regex format '[^;]+' and attempt to get the 2nd element, you will get 'B(1)' which is incorrect as it is the 3rd element:

DON'T USE:

SQL> with tbl(str) as (
  2    select 'A;;B(1);C' from dual
  3  )
  4  select regexp_substr(str, '[^;]+', 1, level)
  5  from tbl
  6  connect by level <= regexp_count(str, ';') + 1;

REGEXP_SU
---------
A
B(1)
C


SQL>

Look close, the NULL is last. Imagine all the incorrect reports out there. Don't let one of them be yours!

Community
  • 1
  • 1
Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • I'm aware of this. Replacing '|' by '| ' before using regexp_substr is doing the job in my case. Nevertheless thanks for the hint. – user1838910 Jul 07 '16 at 06:16
  • Not sure what you are talking about but realspirituals deserves the vote as his answer is the correct one for your question. Mine is additional information to be aware of. Whatever you do, make sure you test with all possible combinations of unexpected data you can think of and make sure your solution handles them all! – Gary_W Jul 07 '16 at 13:22