-1

I need to extract a substring from a long string. I tried the following query but doesn't work , it returns me NULL, I want to extract the first value 12 between the <cc> and </cc>

select regexp_substr('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>,'<CN>[^</CN>]*') 
"REGEXPR_SUBSTR"
FROM DUAL;

I get as a result <CN>ROSSI but I want also to eliminate also the <CN> , any suggestion?

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Xrew
  • 1
  • 1

3 Answers3

5

Don't use a regular expression to parse XML data; use a proper XML parser:

SELECT t.*
FROM   XMLTABLE(
         '/root'
         PASSING XMLTYPE(
           '<root>'
           || '<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>'
           || '</root>'
         )
         COLUMNS
           cc NUMBER PATH './CC',
           cn VARCHAR2(20) PATH './CN',
           no VARCHAR2(20) PATH './NO',
           "IN" VARCHAR2(50) PATH './IN'
       ) t

Which outputs:

CC | CN    | NO    | IN                
-: | :---- | :---- | :-----------------
 3 | ROSSI | MARIO | VIA DELLE MIMOSE 4

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
0

You may get ROSSI using

select regexp_substr('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>','<CN>([^<]*)</CN>', 1, 1, NULL, 1)

See the online Oracle demo.

The <CN>([^<]*)</CN> regex matches <CN>, then captures into Group 1 any zero or more chars other than < and then matches </CN>. Only the captured part is returned due to the last argument 1.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This won't match anything if the `CN` element contains nested tags such as `ROSSI` [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=75d0fdf429e29acce1e43d92287d80db). – MT0 May 01 '20 at 22:56
  • @MT0 The tags OP shows are not nested. Otherwise, no regex in Oracle could help and I would have not shared this solution. – Wiktor Stribiżew May 01 '20 at 23:01
0

Use a subexpression (a matching group enclosed in parentheses) to grab what you want:

SELECT REGEXP_SUBSTR('<CC>3</CC><CN>ROSSI</CN><NO>MARIO</NO><IN>VIA DELLE MIMOSE 4</IN>',
                     '<CN>(.*?)</CN>', 1, 1, NULL, 1) 
  FROM DUAL;

Here we're telling REGEXP_SUBSTR that we want to match a string which begins with <CN>, is followed by a subexpression of any number of any characters (.*), and ends when </CN> is found. Because there's only a single subexpression ((.*?)) in the regular expression it's sub-expression number 1, which is indicated by the last parameter passed to REGEXP_SUBSTR above.

db<>fiddle here

  • If there is another `` tag in the input then this will not work [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=719f4e1b2e3312cac4206313b8a349cb). – MT0 May 01 '20 at 22:46
  • OK - so make the sub-expression non-greedy. Fixed up the answer, and changed the fiddle to show how the original greedy sub-expression wouldn't work as desired if there were multiple `'`s in the data (which, of course, is not what OP showed, but I'm game :-), and how using a non-greedy sub-expression makes it happy again. Of course, it doesn't capture the second `` and subsequent pairs, but as we're somewhat far afield of the original question we'll leave the solution to that problem as an exercise for the motivated reader. :-) – Bob Jarvis - Слава Україні May 02 '20 at 04:13