0

I'm splitting a string on pipes (|) and the regex [^|]* is working fine when there are missing fields, but it's matching null characters after words:

GARBAGE|||GA|30604 yields matches

GARBAGE, null, null, null, GA, null, 30604, null

I've also tried [^|]+ which yields matches

GARBAGE, GA, 30604

EDIT: What I want is GARBAGE, null, null, GA, 30604. And |||| would yield matches null, null, null, null, null.

I'm referencing matches by index, so how can I fix the regex so it matches field by field, including nulls if there is no other data in the field?

benpva16
  • 446
  • 6
  • 26

2 Answers2

1

This is how split works. You should use a split type function.
There is always a bias split uses.

Your case is simple in that it splits on a single character, in normal cases a regex is not needed.
And in this case, using a regex, the bias cannot be achieved without lookahead/lookbehind.

 # (?:^|(?<=\|))([^|]*)(?=\||$)

 (?:
      ^                     # BOS
   |                      # or
      (?<= \| )             # Pipe behind
 )
 ( [^|]* )             # (1), Optional non-pipe chars
 (?=
      \|                    # Pipe ahead
   |                      # or
      $                     #  EOS
 )
  • This works perfectly at rubular.com, but PL/SQL is matching nothing but nulls with `v_fullname := REGEXP_SUBSTR(input_line, v_regex, 1, 2);`. Do you have any insight as to why this might be? – benpva16 Jun 26 '14 at 13:15
  • @benjsigmon - Sorry, I don't know pl/sql. But, looking at the regex, you don't need the capture group. And for a test you could put the escaped pipe into a stand-alone class. Also, make sure the v_regex string is proper. It should be the raw regex when printed. `(?:^|(?<=[|]))[^|]*(?=[|]|$)` –  Jun 27 '14 at 19:58
1

While not exactly what you want, perhaps you could turn it into rows and work with it that way:

select nvl(regexp_substr( str, '([^|]*)\|{0,1}', 1, level, 'i', 1 ), 'null') part
from ( select 'GARBAGE|||GA|30604' str from dual )
connect by level <= regexp_count( str, '\|' ) + 1;

enter image description here

Specify which row (field) by adding a where clause where level equals the row (field) you want:

select nvl(regexp_substr( str, '([^|]*)\|{0,1}', 1, level, 'i', 1 ), 'null') part
from ( select 'GARBAGE|||GA|30604' str from dual )
where level = 4
connect by level <= regexp_count( str, '\|' ) + 1;

enter image description here

Gary_W
  • 9,933
  • 1
  • 22
  • 40