Vertica REGEXP_SUBSTR use /g flag

Question

I am trying to extract all occurrences of a word before '=' in a string, i tried to use this regex '/\w+(?=\=)/g' but it returns null, when i remove the first '/' and the last '/g' it returns only one occurrence that's why i need the global flag, any suggestions?

`REGEXP_SUBSTR` only returns the first match. You need a workaround [like this one](https://stackoverflow.com/questions/17596363/oracle-11g-get-all-matched-occurrences-by-a-regular-expression). — Wiktor Stribiżew, Mar 06 '18 at 12:30

marcothesane · Answer 1 · 2018-03-12T12:08:31.517

As Wiktor pointed out, by default, you only get the first string in a REGEXP_SUBSTR() call. But you can get the second, third, fourth, etc. Embedded into SQL, you need to treat regular expressions differently from the way you would treat them in perl, for example. The pattern is just the pattern, modifiers go elsewhere, you can't use $n to get the n-th captured sub-expression, and you need to proceed in a specific way to get the n-th match of a pattern, etc.

The trick is to CROSS JOIN your queried table with an in-line created index table, consisting of as many consecutive integers as you expect occurrences of your pattern - and a few more for safety. And Vertica's REGEXP_SUBSTR() call allows for additional parameters to do that. See this example:

WITH
-- one exemplary input row; concatenating substrings for 
-- readability
input(s) AS (
SELECT 'DRIVER={Vertica};COLUMNSASCHAR=1;CONNECTIONLOADBALANCE=True;'
     ||'CONNSETTINGS=set+search_path+to+public;DATABASE=sbx;'
     ||'LABEL=dbman;PORT=5433;PWD=;SERVERNAME=127.0.0.1;UID=dbadmin;'
)
,
-- an index table to CROSS JOIN with ... maybe you need more integers ...
loop_idx(i) AS (
      SELECT  1
UNION SELECT  2
UNION SELECT  3
UNION SELECT  4
UNION SELECT  5
UNION SELECT  6
UNION SELECT  7
UNION SELECT  8
UNION SELECT  9
UNION SELECT 10
)
,
-- the query containing the REGEXP_SUBSTR() call
find_token AS (
SELECT
  i                    -- the index from the in-line index table, needed 
                       -- for ordering the outermost SELECT 
, REGEXP_SUBSTR (
    s                  -- the input string
  , '(\w+)='           -- the pattern  - a word followed by an equal sign; capture the word
  , 1                  -- start from pos 1
  , i                  -- the i-th occurrence of the match
  , ''                 -- no modifiers to regexp
  , 1                  -- the first and only sub-pattern captured
  ) AS token
  FROM input CROSS JOIN loop_idx -- the CROSS JOIN with the in-line index table
)
-- the outermost query filtering the non-matches - the empty strings - away...
SELECT
  token
FROM find_token
WHERE token <> ''
ORDER BY i
;

The result will be one row per found pattern:

token
DRIVER
COLUMNSASCHAR
CONNECTIONLOADBALANCE
CONNSETTINGS
DATABASE
LABEL
PORT
PWD
SERVERNAME
UID

You can do all sorts of things in modern SQL - but you need to stick to the SQL and to the relational paradigm - that's all ...

Happy playing ...

Marco

Vertica REGEXP_SUBSTR use /g flag

1 Answers1