0

I need some help with regex which does not work perfect:

/(?<=([H|h][i|I])+\w+\>)(.*)(?=(\<))/

I have got a few XML, I need to filter out the errorMessage and the errorCode from those XMLs. Not all XML have the same syntax. Sometimes errorMessage sometimes ERRORTEXT sometimes Error_Messages is the tag name in my XMLs.

An example:

<?xml version="1.0" endcoding=UTF-8"?>
<n0:szemelyKutyaFuleResponsexmlns:prx="urn:sap.comproxy:SWP:/1SAI/TREASE1243804269AE457508F4:753" mmlns:n0="http://csajgeneratorws.tny.interfesz.kok.lo/">
    <return>
        <tanzakciosAzonosito>46981682-4637-49d2-bd4d-dcfff543742ed</tanzakciosAzonosito>
        <erdmeny>HIBAS</eredmeny>
        <errorCode>TSH08</errorCode>
        <errorMessage>Azonosítószám már hozzá lett rendelve üzleti partnerhez</errorMessage>
    </return>
</n0:szemelyKutyaFuleResponse>

I think I need to create two regex:

  • One to find the text TSH08 in errorCode
  • and another regex to find Azonosítószám már hozzá lett rendelve üzleti partnerhez in errorMessage!

Pls help THX

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
  • 2
    Why Regex? Use Xml classes for such of requirements. – Maciej Los Feb 16 '21 at 08:06
  • 1
    Use an XML parser, then you can easily traverse the hierarchy – Peter Thoeny Feb 16 '21 at 08:47
  • The xml are stored in a HANA 4 DB this is a part from an abap develep program. I need this information to store later in an avl list or put back to a table... – Erzsébet Gombkötő Feb 16 '21 at 10:01
  • I haven t got a fix hierarcy in the xml structur. Like wrote befor the tag-s are just dimilar but not the same, and by the xml hierarcy the same. PLS help me write a proper regex! – Erzsébet Gombkötő Feb 16 '21 at 10:05
  • 1
    Parsing XML using regular expressions is almost as futile as [parsing HTML with regular expressions](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). You should really use an XML parser for this. Fortunately there is already a bunch of standard classes for this: The [iXML library](https://help.sap.com/doc/abapdocu_750_index_htm/7.50/en-US/index.htm?file=abenabap_ixml_lib.htm). – Philipp Feb 18 '21 at 14:28

1 Answers1

0

If you just want the content of each tag, which is what I understood from your question, then perhaps something like these:

For the first regex:

  • <errorCode>([^<>]+)</errorCode> Demo

  • (?<=<errorCode>)[^<>]+(?=</errorCode>) Demo

For the second regex:

  • <errorMessage>([^<>]+)</errorMessage> Demo

  • (?<=<errorMessage>)[^<>]+(?=</errorMessage>) Demo

You also can merge them with an | between the two if you don't care about the tag.

A | can also be added if the tag's name might differ like this: <(?:errorMessage|ERRORTEXT|Error_Messages)>([^<>]+)</(?:errorMessage|ERRORTEXT|Error_Messages)> Demo

Omar Si
  • 154
  • 1
  • 2
  • 5