-1

I have always try to do quick Regex query on XML tags. Yep, I have been told it is not a good idea an you should load it in an object but sometimes like this, it is a in Oracle db blob. I need to get the VALUE for a specific NAME in a name value pair XML like this :

<entry>
  <string>NAME</string>
  <string>VALUE</string>
</entry>

Is there a way to do this with REGEX

mkirouac
  • 113
  • 1
  • 12
  • Use `XMLQUERY` or `XMLTABLE`. [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=8d527d790d16e869f6f22cfda281e01e) – MT0 Nov 06 '20 at 19:25
  • Or, using a `BLOB` [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=d28c58092328ee8ff658a7712fe005d0) – MT0 Nov 06 '20 at 19:35
  • @kjhughes None of those questions are duplicates of this one. Yes, using a regular expression to parse XML is not always/often going to to work and the OP should use an XML parser but if you are going to close a question as a duplicate then you should find a question that it duplicates. – MT0 Nov 06 '20 at 20:00
  • @MT0: Nearly every question that seeks to parse XML with regex is not only a duplicate, but misguided and absurdly over-asked, over-answered, over-debated to death. One of the duplicate links I added taught how to use XPath in Oracle, an eminently more reasonable approach. You're very gracious to offer your time to customize that approach for OP, and you're wise not to indulge his request to use regex. +1 to you. – kjhughes Nov 06 '20 at 22:31

1 Answers1

3

You should parse XML using an XML Parser and in Oracle you can use XMLQUERY:

SELECT XMLQUERY(
         '/root/entry/string[1][text()="NAME2"]/../string[2]/text()'
         PASSING XMLTYPE( xml, NLS_CHARSET_ID('UTF8') )
         RETURNING CONTENT
       ) AS value
FROM   table_name;

Or XMLTABLE:

SELECT value
FROM   table_name
       CROSS APPLY XMLTABLE(
         '/root/entry'
         PASSING XMLTYPE( xml, NLS_CHARSET_ID('UTF8') )
         COLUMNS
           name  VARCHAR2(20) PATH './string[1]',
           value VARCHAR2(20) PATH './string[2]'
       )
WHERE  name = 'NAME2';

Which for the sample data:

CREATE TABLE table_name ( xml BLOB );

DECLARE
  value        CLOB := '<root>
  <entry>
    <string>NAME1</string>
    <string>VALUE1</string>
  </entry>
  <entry>
    <string>NAME2</string>
    <string>VALUE2</string>
  </entry>
  <entry>
    <string>NAME3</string>
    <string>VALUE3</string>
  </entry>
  <entry>
    <string>NAME4</string>
    <string>VALUE4</string>
  </entry>
</root>';

  dest_offset  INTEGER := 1;
  src_offset   INTEGER := 1;
  lang_context INTEGER := DBMS_LOB.DEFAULT_LANG_CTX;
  result       BLOB;
  warning      INTEGER;
  warning_msg  VARCHAR2(50);
BEGIN
  DBMS_LOB.CreateTemporary(
    lob_loc => result,
    cache   => TRUE
  );

  DBMS_LOB.CONVERTTOBLOB(
    dest_lob     => result,
    src_clob     => value,
    amount       => LENGTH( value ),
    dest_offset  => dest_offset,
    src_offset   => src_offset,
    blob_csid    => DBMS_LOB.DEFAULT_CSID,
    lang_context => lang_context,
    warning      => warning
  );

  INSERT INTO table_name ( xml ) VALUES ( result );
END;
/

Both outputs:

| VALUE  |
| :----- |
| VALUE2 |

Can you do it with a regular expression? Yes:

SELECT REGEXP_SUBSTR(
         TO_CLOB( xml ),
         '<entry>\s*<string>NAME2</string>\s*<string>([^<]*)</string>\s*</entry>',
         1,
         1,
         'c',
         1
       ) AS value
FROM   table_name

Which outputs:

| VALUE  |
| :----- |
| VALUE2 |

db<>fiddle here

However, you shouldn't as the XML parsing functions take an XPATH which specifies where it should look for the data. The regular expression will just treat the value as a string and look for the first match even if it is not in the expected place in the XML hierarchy.

For example, if your data is:

<root>
  <entry>
    <string>NAME1</string>
    <string>VALUE1</string>
    <other><entry><string>NAME2</string><string>NOT THIS</string></entry></other>
  </entry>
  <entry>
    <string>NAME2</string>
    <string>VALUE2</string>
  </entry>
</root>

Then XMLQUERY and XMLTABLE will find the correct value but the regular expression outputs:

| VALUE    |
| :------- |
| NOT THIS |

db<>fiddle here

Or, if your data suddenly has an attribute:

<root>
  <entry>
    <string>NAME1</string>
    <string>VALUE1</string>
  </entry>
  <entry>
    <string>NAME2</string>
    <string attr="attr value">VALUE2</string>
  </entry>
</root>

Then parsing with the regular expression will fail and return NULL.

db<>fiddle here

So, don't use a regular expression, use a proper XML parser.

MT0
  • 143,790
  • 11
  • 59
  • 117