0

I have a string in a column 'ABC1,ABC2,WWW1,WWW2,ABC3,WWW3,WWW4,ABC4' I need to extract words starting with ABC alone . The expected results should be

ABC1,ABC2,ABC3,ABC4

I tried the bellow but didnt work

select regexp_substr(split_part('ABC1,ABC2,WWW1,WWW2,ABC3,WWW3,WWW4,ABC4',',',1),'ABC[^"]*')

I am getting a the First occurrence only : as ABC1 . How to get all occurrence of ABC*

Sachin Sukumaran
  • 707
  • 2
  • 9
  • 25
  • When you say it didn't work, please mention exactly why – Yosi Dahari Sep 30 '19 at 12:47
  • @YosiDahari - Thanks for your comment. I am getting only the first occurance only - Need help in getting all occurance of the word . – Sachin Sukumaran Sep 30 '19 at 12:54
  • Do you have an upper bound of the number of items in the string? – Yosi Dahari Sep 30 '19 at 13:00
  • 1
    both `split_part` and `regexp_substr` return on the n`th occurrence of the match. Not all matches. To get what you want you need to CROSS JOIN with a table containing the numbers 1..100 (or whatever the maximum number of words you need to extract) and apply the functions to each index effectively turning the list into multiple rows. Then select the rows you need. See [here](http://torsten.io/stdout/expanding-json-arrays-to-rows/) for an example. – nimrodm Sep 30 '19 at 13:00
  • @YosiDahari : Number of items in the string = 20 , and number of occurrence of pattern(ABC*) can be from 0 - 4 – Sachin Sukumaran Sep 30 '19 at 13:07

1 Answers1

1

You are getting only one match since the functions you used (split_part,regexp_substr) returns the n'th element of the match.

The most naive approach to solve this:

select regexp_substr(split_part('ABC1,ABC2,WWW1,WWW2,ABC3,WWW3,WWW4,ABC4',',',1),'ABC[^"]*')
union all
select regexp_substr(split_part('ABC1,ABC2,WWW1,WWW2,ABC3,WWW3,WWW4,ABC4',',',2),'ABC[^"]*')
union all
...
select regexp_substr(split_part('ABC1,ABC2,WWW1,WWW2,ABC3,WWW3,WWW4,ABC4',',',20),'ABC[^"]*')

To improve this, you would need to first transform the comma delimited value to multiple rows, and then using the regex. This can be done in various ways, see this post for more details.

Result: ABC1 ABC2

The tricky part here, is the absence os a table-generating function to split the comma delimited value to multiple rows.

Yosi Dahari
  • 6,794
  • 5
  • 24
  • 44