I have a table containing a column with strings. I want to extract all pieces of text in each string that come immediately after a certain substring. For this minimum reproducible example, let's assume this substring is abc
. So I want all subsequent terms after abc
.
I'm able to achieve this in cases where there is only 1 abc
per row, but my logic fails when there are multiple abc
s. I'm also getting the number of substring occurrences, but am having trouble relating that to retrieving all of those occurrences.
My approach/attempt:
I created a temp table that contains the # of successful regex matches in my main string:
CREATE TEMP TABLE match_count AS (
SELECT DISTINCT id, main_txt, regexp_count(main_txt, 'abc (\\S+)', 1) AS cnt
FROM my_data_source
WHERE regexp_count(main_txt, 'abc (\\S+)', 1) > 0);
My output:
id main_txt cnt
1 wpfwe abc weiofnew abc wieone 2
2 abc weoin 1
3 abc weoifn abc we abc w 3
To get my final output, I have a query like:
SELECT id, main_txt, regexp_substr(main_txt, 'abc (\\S+)', 1, cnt, 'e') AS output
FROM match_count;
My actual final output:
id main_txt output
1 wpfwe abc weiofnew abc wieone wieone
2 abc weoin weoin
3 abc weoifn abc we abc w w
My expected final output:
id main_txt output
1 wpfwe abc weiofnew abc wieone weiofnew
1 wpfwe abc weiofnew abc wieone wieone
2 abc weoin weoin
3 abc weoifn abc we abc w weoifn
3 abc weoifn abc we abc w we
3 abc weoifn abc we abc w w
So my code only gets the final match (where the occurrence # = cnt
). How can I modify it to include every match?