Oracle 11g get all matched occurrences by a regular expression

Question

I'm using Oracle 11g and I would like to use the REGEXP_SUBSTR to match all the occurrences for a given pattern. For example

 SELECT
  REGEXP_SUBSTR('Txa233141b Ta233141 Ta233142 Ta233147 Ta233148',
  '(^|\s)[A-Za-z]{2}[0-9]{5,}(\s|$)') "REGEXP_SUBSTR"
  FROM DUAL;

returns only the first match Ta233141 but I would like to return the other occurrences that match the regex, meaning Ta233142 Ta233147 Ta233148.

Alex Poole · Accepted Answer · 2016-03-30T17:00:39.260

REGEXP_SUBSTR only returns one value. You could turn your string into a pseudo-table and then query that for matches. There's an XML-based way of doing this that escapes me at the moment, but using connect-by works, as long as you only have one source string:

SELECT REGEXP_SUBSTR(str, '[^ ]+', 1, LEVEL) AS substr
FROM (
    SELECT 'Txa233141b Ta233141 Ta233142 Ta233147 Ta233148' AS str FROM DUAL
)
CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^ ]+')) + 1;

... gives you:

SUBSTR             
--------------------
Txa233141b           
Ta233141             
Ta233142             
Ta233147            
Ta233148

... and you can filter that with a slightly simpler version of your original pattern:

SELECT substr
FROM (
    SELECT REGEXP_SUBSTR(str, '[^ ]+', 1, LEVEL) AS substr
    FROM (
        SELECT 'Txa233141b Ta233141 Ta233142 Ta233147 Ta233148' AS str
        FROM DUAL
    )
    CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^ ]+')) + 1
)
WHERE REGEXP_LIKE(substr, '^[A-Za-z]{2}[0-9]{5,}$');

SUBSTR             
--------------------
Ta233141             
Ta233142             
Ta233147             
Ta233148

Which isn't very pretty, but neither is holding multiple values in one field.

this code doesnot return Ta233148. To get that change CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^ ]+')) to CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(str, '[^ ]+'))+1 — sakthi, Mar 19 '14 at 07:47
Using this code directly with anything but a simple splitter character will return way to many results. Replace LENGTH(REGEXP_REPLACE(str, '[^ ]+'))+1 with REGEXP_COUNT(str, '[^ ]+') and you will get the expected result. — bkqc, Jan 23 '18 at 21:15

Steven Cochran · Answer 2 · 2020-05-15T15:20:32.043

This is a little late, but I needed basically the same thing and could not find a good snippet. I needed to search a free text column of a table for some terms and collect them. As this might be useful to another I have included a version based on this question. While REGEXP_SUBSTR only returns one value, Oracle also provides a REGEXP_COUNT to tell you how many matching items are present in a given string, therefore you can join this with a list of indexes to select each as follows (with examples from this query as free text from some 'source_table'):

DEFINE MATCH_EXP = "'(^|\s)[A-Za-z]{2}[0-9]{5,}'"

WITH source_table
     -- Represents some DB table with a 'free_text' column to be checked.
     AS (       ( SELECT 'Txa233141 Ta233141 Ta232 Ta233142 Ta233141 Ta233148'
                             AS free_text FROM dual )
          UNION ( SELECT 'Other stuff PH33399 mixed in OS4456908843 this line'
                             AS free_text FROM dual )
        )
   , source
     -- For some table, select rows of free text and add the number of matches
     -- in the line.
     AS ( SELECT cnt
               , free_text
          FROM ( SELECT RegExp_Count(free_text, &MATCH_EXP) AS cnt
                      , free_text 
                 FROM source_table )
          WHERE cnt > 0 )
   , iota
     -- Index generator
     AS ( SELECT RowNum AS idx
          FROM dual
          CONNECT BY RowNum <= ( SELECT Max(cnt) FROM source ) )
-- Extract the unique 'cnt' matches from each line of 'free_text'.
SELECT UNIQUE
       RegExp_SubStr(s.free_text, &MATCH_EXP, 1, i.idx) AS result
FROM   source s
  JOIN iota i
    ON ( i.idx <= s.cnt )
ORDER BY result ASC
;

It has the advantages of working for any list of selected rows and uses the CONNECT BY minimally (as this can be very slow).

score 2 · Answer 3 · edited Jun 09 '15 at 11:11

2

How about adding a function that will loop through and return all of the values?

create or replace function regexp_substr_mr (
  p_data clob,
  p_re varchar
  )
return varchar as
  v_cnt number;
  v_results varchar(4000);
begin
  v_cnt := regexp_count(p_data, p_re, 1,'m');
  if v_cnt < 25 then
    for i in 1..v_cnt loop
      v_results := v_results || regexp_substr(p_data,p_re,1,i,'m') || chr(13) || chr(10);
    end loop;
  else 
    v_results := 'WARNING more than 25 matches found';
  end if;

  return v_results;
end;

Then just call the function as part of the select query.

edited Jun 09 '15 at 11:11

Daemoncoder

3
3

answered May 15 '14 at 16:10

PhilJ

21
1

1

In principle I prefer this approach as in my opinion it has more general application than the accepted solution. I would however improve it by passing in p_data, p_re, and flags and creating a type something like "create or replace type mytype is table of varchar2(4000)" and return that from the function. One could then embed it in SQL something like as follows: select A.id,X.column_value from someTable A cross join table(regexp_substr_mr(A.textToSearch,'regexp') X – Pancho Oct 20 '16 at 14:37

David E. Veliev · Answer 4 · 2016-12-10T00:03:21.413

I am fix @Alex Poole answer for multi-line source support and for more quickly executing:

with templates as (select '\w+' regexp from dual)
select 
    regexp_substr(str, templates.regexp, 1, level) substr
from (
    select 1 id, 'Txa233141b Ta233141 Ta233142 Ta233147 Ta233148' as str from dual
    union
    select 2 id, '2 22222222222222Ta233141 2Ta233142 2Ta233147' as str from dual
    union
    select 3 id, '3Txa233141b 3Ta233141 3Ta233142' as str from dual
)
join templates on 1 = 1
connect by 
    id = connect_by_root id
    and regexp_instr(str, templates.regexp, 1, level) > 0
order by id, level

Source lines:

ID  STR                                             
--  ----------------------------------------------  
1   Txa233141b Ta233141 Ta233142 Ta233147 Ta233148  
2   2 22222222222222Ta233141 2Ta233142 2Ta233147    
3   3Txa233141b 3Ta233141 3Ta233142

Result:

Txa233141b              
Ta233141                
Ta233142                
Ta233147                
Ta233148                
2                       
22222222222222Ta233141  
2Ta233142               
2Ta233147               
3Txa233141b             
3Ta233141               
3Ta233142

Your query return error: "ORA-30007: CONNECT BY ROOT operator is not supported in the START WITH or in the CONNECT BY condition". How to fix it? — tungns, Nov 13 '18 at 09:07

score 0 · Answer 5 · answered Nov 13 '18 at 09:53

Base on @David E. Veliev answer, this is query for multi rows input. If below query work for you, consider vote up original answer.

SELECT SUBSTR
  FROM (WITH TEMPLATES AS (SELECT '\w+' REGEXP FROM DUAL)
         SELECT ID,
                CONNECT_BY_ROOT ID CBR,
                LEVEL LVL,
                REGEXP_SUBSTR(STR, TEMPLATES.REGEXP, 1, LEVEL) SUBSTR
           FROM (SELECT 1 ID,
                        'Txa233141b Ta233141 Ta233142 Ta233147 Ta233148' AS STR
                   FROM DUAL
                 UNION
                 SELECT 2 ID,
                        '2 22222222222222Ta233141 2Ta233142 2Ta233147' AS STR
                   FROM DUAL
                 UNION
                 SELECT 3 ID,
                        '3Txa233141b 3Ta233141 3Ta233142' AS STR
                   FROM DUAL)
           JOIN TEMPLATES
             ON 1 = 1
         CONNECT BY REGEXP_INSTR(STR, TEMPLATES.REGEXP, 1, LEVEL) > 0)
          WHERE ID = CBR
          GROUP BY ID,
                   CBR,
                   LVL,
                   SUBSTR
          ORDER BY ID,
                   LVL;

Input:

ID  STR
==  ==============================================
1   Txa233141b Ta233141 Ta233142 Ta233147 Ta233148
2   2 22222222222222Ta233141 2Ta233142 2Ta233147
3   3Txa233141b 3Ta233141 3Ta233142

Output:

SUBSTR
======================
Txa233141b
Ta233141
Ta233142
Ta233147
Ta233148
2
22222222222222Ta233141
2Ta233142
2Ta233147
3Txa233141b
3Ta233141
3Ta233142

Sergio Barbosa · Answer 6 · 2022-07-14T19:29:43.407

0

SELECT  
  LISTAGG(REGEXP_SUBSTR('Txa233141b Ta233141 Ta233142 Ta233147 Ta233148', '(^|\s)[A-Z]{2}[0-9]{5,}',1,level) 
         ) WITHIN GROUP(ORDER BY 1)
  AS REGEXP_SUBSTR
FROM DUAL
CONNECT BY level<=regexp_count('Txa233141b Ta233141 Ta233142 Ta233147 Ta233148','(^|\s)[A-Z]{2}[0-9]{5,}')
;

edited Jul 14 '22 at 19:29

answered Jul 14 '22 at 19:25

Sergio Barbosa

1
1

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jul 16 '22 at 19:37

score -2 · Answer 7 · edited Dec 04 '15 at 19:09

-2

The below is the easy solution for your question.

SELECT REGEXP_SUBSTR('Txa233141b Ta233141 Ta233142 Ta233147 Ta233148',
  '([a-zA-Z0-9]+\s?){1,}') "REGEXP_SUBSTR"
  FROM DUAL;

edited Dec 04 '15 at 19:09

bummi

27,123
14
62
101

answered Dec 04 '15 at 19:02

Mohammed Zubair

1
1

That just returns the original string; it includes Txa233141b even though that starts with three non-numeric characters, not two as the OP's pattern requires. – Alex Poole Mar 30 '16 at 16:58

Oracle 11g get all matched occurrences by a regular expression

7 Answers7

Linked