0

I have the following xml

<person>
    <id>1</id>
    <name>John</name>
    <phone>235 234</phone>
    <address>
        <street>1</street>
        <city>A</city>
        <state>B</state>
        <country>C</country>
    </address>
</person>

I transformed this xml into string and this is a dynamic xml. Some xml has all these elements and some not have the specified element and some xml has additional element.

Based on the xml string I want to write the regular expression to find whether the given element (as input) present in the string.

How to write regular expression for this?

Achaius
  • 5,904
  • 21
  • 65
  • 122
  • 6
    Why not use an XML parser? RE isn't really the way to go. If you want to test regular expressions, try rubular.com or regexpal.com – blueygh2 Jun 20 '14 at 06:54
  • If you have converted it into string then use `String.contains()` or `String.indexOf()` methods. – Braj Jun 20 '14 at 06:56
  • 4
    You know http://stackoverflow.com/a/1732454/1907906, don't you? –  Jun 20 '14 at 06:57
  • 1
    If your problem is *only* to see if an element exists, you can use the `contains("")` method as indicated by Braj. If your problem will get any more complex and require accessing the XML content, you should be using a parser, as blueygh2 noted. – beerbajay Jun 20 '14 at 06:58
  • I don't want to use parser, because sometimes instead of XML, HTML file was used to transform into string. If I use parser, It will throw ParserException. I don't want to get this. My intention is to check whether the string is HTML or XML before parsing using SAXParser. – Achaius Jun 20 '14 at 07:02
  • Does the xml start with an xml declaration as per [Does a valid XML file require an xml declaration?](http://stackoverflow.com/questions/7007427/does-a-valid-xml-file-require-an-xml-declaration) – Steve C Jun 20 '14 at 07:27
  • I want to solve http://stackoverflow.com/questions/24322234/how-to-find-the-given-string-is-a-rss-feed-or-not – Achaius Jun 20 '14 at 07:45
  • 2
    You can't. There is no regular expression that will match all legal ways of writing this XML (including allowed variations) that will not also match something else. That's a theoretically-provable result. – Michael Kay Jun 20 '14 at 08:17

1 Answers1

1

All the comment writers are right. There are better methods than using a regular expression search to find out if an XML element contains a specified element or its tag.

But if you really want to do this task with a regular expression search, it is possible to use for your example:

<person>(?:(?!</person>)[\S\s])+<XXX\b(?:(?!</person>)[\S\s])+</person>

This expression matches everything from starting tag <person> to ending tag </person> if it contains <XXX whereby XXX is the element to find within element person.

Note: This regular expression works only if element person does not contain itself another person element and there is no CDATA section containing </person> or <person or <XXX.

The expression just checks if the starting tag for element XXX is found without a check for the ending tag as it is not clear from the question if all elements must be present with a starting and an ending tag or if some could be also empty elements in form <XXX />.

For an explanation of this regular expression read my answer on Deleting duplicate values using find and replace in a text editor.

Community
  • 1
  • 1
Mofi
  • 46,139
  • 17
  • 80
  • 143