Need help on REGEXP_SUBSTR() function

Question

I have a query which gets the results from a column string which is between > and <\

Query is written as

SELECT REGEXP_SUBSTR ('Column','([^>]+)\</', 1, 1,NULL,1) from table

I understand 3rd parameter is the start position 4th is for occurrence

But not sure on 5th and 6th

Can any one explain how this query is working. I have looked at few blogs but the parameters are different.

what is ([^>]+)\</ , Null (5th) and 1 (6th)

[Literally second link in google for `REGEXP_SUBSTR`](https://www.oracletutorial.com/oracle-string-functions/oracle-regexp_substr/). 5th: no flags should be applied for this regex match, 6th: return group (content of parentheses) number 1 — markalex, Apr 04 '23 at 10:47
`([^>]+)\` is a regex. You can see what is a regex, and what does all this mean [here](https://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean/22944075#22944075) — markalex, Apr 04 '23 at 10:49
It is clearly described in the documentations, which is the most reliable source of information: [REGEXP_SUBSTR](https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/REGEXP_SUBSTR.html#GUID-2903904D-455F-4839-A8B2-1731EF4BD099). Moreover, it has additional links to explanations, examples etc. — astentx, Apr 04 '23 at 11:10

score 2 · Answer 1 · answered Apr 04 '23 at 10:59

Your regular expression is wrong and will not match "results from a column string which is between > and <\" as it never matches an initial >.

If you want to match between those then you need to include the leading > and then you can just match as few characters as possible (using the non-greedy pattern [^<>]*?) until you find </ (in the description you state \ and in the code you use /. I assume you want / as that is used in XML):

SELECT REGEXP_SUBSTR (
         'Column',
         '>([^<>]*?)</',
         1,
         1,
         NULL,
         1
       )
FROM   table_name

As for:

what is ([^>]+)\</, Null (5th) and 1 (6th)

The regular expression is incorrect as it will match one-or-more non-> characters until it finds the first </ sub-string.

The parameters are described in Oracle's REGEXP_SUBSTR documentation and are the match_param and subexp arguments. A complete description is available in the documentation but your arguments will use the default match parameters and return the contents of the first capturing group.

However

If you are really parsing XML (or XHTML) then you would be better using a proper XML parser such as XMLTABLE.

Thanks @MT0 , Script provided above is actually working in my environment. I am a beginner in pl sql so wanted to understand how it actually works. Yes , File is XML. I have tried to use XMLTABLE but the prob is database is in different server and i dont have access to it. I see XMLTABLE works fine by creating a directory and then placing the file in the location. With the knowledge i have, I was not able to achieve the requirement by passing the xml file contents to XMLTABLE so created a SQL custom query — nav, Apr 05 '23 at 08:21

Need help on REGEXP_SUBSTR() function

1 Answers1

However