Oracle REGEX_SUBSTR Not Honoring null values

Question

I have an issue of regex_substr not honoring the null value.

select
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 1)    AS phn_nbr,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 2)    AS phn_pos,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 3)    AS phn_typ,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 4)    AS phn_strt_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5)    AS phn_end_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 6)    AS pub_indctr
from dual;

If the phn_end_dt is null and pub_indctr is not null, the values of pub_indctr are shifted to phn_end_dt.

Result:-

PHN_NBR    PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR  
---------- ------- ------- ----------- ---------- ------------
2035197553 2       S       14-JUN-14   P

While it should be

PHN_NBR    PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR  
---------- ------- ------- ----------- ---------- ------------
2035197553 2       S       14-JUN-14               P

Any suggestions ?

score 4 · Answer 1 · edited May 23 '17 at 11:58

I'm afraid your accepted answer does not handle the case where you need the value after the null position (try to get the 6th field):

SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 6) phn_end
_dt
  2  from dual;

P
-

You need to do this instead I believe (works on 11g):

SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '([^,]*)(,|$)', 1, 6,
NULL, 1) phn_end_dt
  2  from dual;

P
-
P

I just discovered this after posting my own question: REGEX to select nth value from a list, allowing for nulls

score 2 · Answer 2 · answered Aug 27 '14 at 14:29

You can solve your task like this:

with t(val) as (
  select '2035197553,2,S,14-JUN-14,,P' from dual
), t1 (val) as (
  select ',' || val || ',' from t
)
select substr(val, REGEXP_INSTR(val, ',', 1, 1) + 1, REGEXP_INSTR(val, ',', 1, 1 + 1) - REGEXP_INSTR(val, ',', 1, 1) - 1) a
     , substr(val, REGEXP_INSTR(val, ',', 1, 2) + 1, REGEXP_INSTR(val, ',', 1, 2 + 1) - REGEXP_INSTR(val, ',', 1, 2) - 1) b
     , substr(val, REGEXP_INSTR(val, ',', 1, 3) + 1, REGEXP_INSTR(val, ',', 1, 3 + 1) - REGEXP_INSTR(val, ',', 1, 3) - 1) c
     , substr(val, REGEXP_INSTR(val, ',', 1, 4) + 1, REGEXP_INSTR(val, ',', 1, 4 + 1) - REGEXP_INSTR(val, ',', 1, 4) - 1) d
     , substr(val, REGEXP_INSTR(val, ',', 1, 5) + 1, REGEXP_INSTR(val, ',', 1, 5 + 1) - REGEXP_INSTR(val, ',', 1, 5) - 1) e
     , substr(val, REGEXP_INSTR(val, ',', 1, 6) + 1, REGEXP_INSTR(val, ',', 1, 6 + 1) - REGEXP_INSTR(val, ',', 1, 6) - 1) f
  from t1

     A      B   C       D       E   F
-------------------------------------
2035197553  2   S   14-JUN-14   -   P

Your solution works well whether you use regexp or not. Of course without the regexp, the performance will be better. — Patrick Bacon, Aug 27 '14 at 16:12

Patrick Bacon · Answer 3 · 2014-09-03T17:11:13.810

The typical csv parsing approach is as follows:

WITH t(csv_str) AS
  ( SELECT '2035197553,2,S,14-JUN-14,,P' FROM dual
  UNION ALL
  SELECT '2035197553,2,S,14-JUN-14,,' FROM dual
  )
SELECT LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 1), ',') AS phn_nbr,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 2), ',') AS phn_pos,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 3), ',') AS phn_typ,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 4), ',') AS phn_strt_dt,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 5), ',') AS phn_end_dt,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 6), ',') AS pub_indctr
FROM t

I like to place a comma preceeding my csv and then I would count the commas with the non-comma pattern.

Explanation of the search pattern

The search pattern looks for the nth substring (nth corresponds with the nth element in the csv) which has the following:

-The pattern begins with a ','

-Next, it is followed by the pattern, '[^,]'. This is just a non-matching list expression. The caret, ^, conveys that the characters following in the list should not be matched.

-This non-matching list of characters has the quantifier, *, which means this can occur 0 or more times.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once a match is found, I would also use the LTRIM function to remove the comma after I used the reg expression.

What is nice about this approach is the occurrence of the search pattern will always correspond with the occurences of the comma.

Avinash Raj · Answer 4 · 2014-08-27T14:22:29.693

1

You need to change this line,

REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5)    AS phn_end_dt,

to,

REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 5)    AS phn_end_dt,
                                                   ^

[^,]+ means it matches any character not of , one or more times. [^,]* means it matches any character not of , zero or more times. So [^,]+ assumes that there must be a single character not of , would present. But really there isn't , by changing + to * makes the regex engine to match a empty character.

edited Aug 27 '14 at 14:22

answered Aug 27 '14 at 14:15

Avinash Raj

172,303
28
230
274

Thanks for pointing me in the right direction, I have used this to solve the issue. SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt ,REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual); – Ankit Aug 27 '14 at 14:25
@Ankit: Could you unmark this as an answer, and post your solution as an answer. Since Avinash Raj's answer is wrong. – Wouter Aug 14 '15 at 11:50
@Avinash Raj: On what Oracle version did you test this? I'm getting one NULL value for each comma in the string, so the P-values end up in the 10th capture group. I get the S-value in the 5th capture group. Using Oracle 12c – Wouter Aug 14 '15 at 11:53

score 1 · Accepted Answer · answered Aug 14 '15 at 13:14

Thanks for pointing me in the right direction, I have used this to solve the issue.

SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt , REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual );

Oracle Version:- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

score 1 · Answer 6 · answered Jan 05 '23 at 15:08

I have a generic use case where I don't know the exact columns coming in the string. I thus used below code which solved the purpose.

function substring_specific_occurence(p_string varchar2
                                    ,p_delimiter varchar2
                                    ,p_occurence number) return varchar2
is 
    l_output varchar2(2000);
    g_miss_char     varchar2(20) := 'fdkjkjhkuhhf7';
    l_string varchar2(10000) := replace(p_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'' );

begin 

    while  (l_string like '%'||p_delimiter||p_delimiter||'%' )
    loop 
        l_string := replace(l_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'');
    end loop;

    select regexp_substr(l_string,'[^'||p_delimiter||']+',1,p_occurence) 
    into l_output
    from dual;

    return replace(l_output,g_miss_char);

end substring_specific_occurence;

Oracle REGEX_SUBSTR Not Honoring null values

6 Answers6

Linked