0

For this data in a packed field: ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVENTHEBAND|TWELVE

I have this regexp: REGEXP_SUBSTR(the_packed_field_column , '((|)([[:alpha:] ]*)(|))' , 1 , 1) <--gives the expected result of TWO

but this regexp gives a result of |FOUR| REGEXP_SUBSTR(the_packed_field_column , '((|)([[:alpha:] ]*)(|))' , 1 , 2)

What SQL will work to split out specific values with no surprises?

UPDATE - with a great assist from blhsing and Gary_W I have this solved. Again, the challenge is to parse specific values out of a packed field with pipe delimiter:

ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN_THE_BAND|TWELVE

but can also be: ONE|TWO||||SIX||EIGHT||TEN|ELEVEN_THE_BAND|TWELVE

might even be: ONE|TWO||||||||||TWELVE

For anyone who needs the REGEXP to parse a value out of a packed field with pipe delimiter, this is the basic pattern for all but the last:

'([[:alpha:]_]*)(|)' <--- in English this is: zero or more alpha characters plus also a _, then a literal pipe (wow - annoying - StackOverflow took out the backslash that makes it a literal pipe)

giving you this REGEXP_SUBSTR for the first eleven REGEXP_SUBSTR(the_column , '([[:alpha:]_]*)(|)' , 1 , n)

might also be REGEXP_SUBSTR(the_column , '([[:alpha:]_[:digit:]]*)(|)' , 1 , n)

the last one, TWELVE in this example, gets a slightly different pattern: '(backslash|)([[:alpha:]_]*)' , 1 , 11) <--- typed "backslash" because StackOverflow took out the \ that I typed (what??)

huge credit to anybody who knows how to exclude the pipe delimiters in the regexp, I had to do a REPLACE '|', ''

mojave
  • 21
  • 4

2 Answers2

1

Your regex includes both a leading pipe and a trailing pipe, so each match consumes two pipes, the latter of which should belong to the next token.

For the purpose of selecting the n-th word from a given string, matching a sequence of alphabets should be enough:

REGEXP_SUBSTR(the_packed_field_column, '[[:alpha:]]+' , 1 , 3)

which gives you:

THREE

Demo: https://www.mycompiler.io/view/JmZrhY2ugP8

blhsing
  • 91,368
  • 6
  • 71
  • 106
  • This form of regex does not handle NULL elements. Try with ONE||THREE|FOUR and select the 3rd element. You'll really get the 4th element returned: 'FOUR'. – Gary_W Mar 16 '23 at 12:25
  • 1
    thanks blhsing. When I saw the word "consume" I knew what was wrong – mojave Mar 17 '23 at 04:44
1

If you are using Oracle, use this form of REGEXP_SUBSTR() which handles NULL elements. For more information, see this post

REGEXP_SUBSTR(the_packed_field_column, '(.*?)(\||$)', 1 , 3, NULL, 1)

The regex returns the 3rd instance of an optional set of characters followed by a literal pipe or the end of the line. Then, returns the first group of that 3rd set. If a previous element is NULL, you will still get the 3rd element.

Gary_W
  • 9,933
  • 1
  • 22
  • 40