-1

I have some content and I would like to know whether they are XML or not. How to do that ? I would only need to know the answer true or false from a method return type. I plan to use REgex but open for better suggestions.

The XML content is as following and will be always in the same format (may be the molecule ID will be increased or decreased),

<?xml version="1.0" encoding="UTF-8"?>
<molecules>
    <molecule id="1">
        <atoms>
            <atom id="1" symbol="C"/>
            <atom id="2" symbol="C"/>
            <atom id="3" symbol="N"/>
        </atoms>
        <bonds>
            <bond id="1" atomAId="1" atomBId="2" order="SINGLE"/>
            <bond id="2" atomAId="2" atomBId="3" order="DOUBLE"/>
        </bonds>
    </molecule>
     <molecule id="2">
        <atoms>
            <atom id="1" symbol="C"/>
            <atom id="2" symbol="C"/>
            <atom id="3" symbol="N"/>
        </atoms>
        <bonds>
            <bond id="1" atomAId="1" atomBId="2" order="SINGLE"/>
            <bond id="2" atomAId="2" atomBId="3" order="DOUBLE"/>
        </bonds>
    </molecule>
</molecules> 

I make the Regex to recognize the XML as following,

public static final String REGEX_FOR_XML = "((<(\\S(.*?))(\\s.*?)?>(.*?)<\\/\\3>)|(<\\S(.*?)(.*?)(\\/>)))";

The issue is it only matches with the inner content while I would like to make an entire content match. I use this validator for matching,

public static boolean isValidXML(String inXMLStr) {

    if (inXMLStr == null || inXMLStr.isEmpty())
        return false;

    final Pattern pattern = Pattern.compile(Constants.REGEX_FOR_XML);
    if (pattern.matcher(inXMLStr).matches()) {
        return true;
    }
    return false;
}

How can I correct the Regex to match with the XML content or what to do as better option ?

Arefe
  • 11,321
  • 18
  • 114
  • 168
  • 5
    Why not just parse XML instead of using regex? – shree.pat18 Jan 09 '17 at 07:53
  • 1
    Do yourself a favour and do not use Regex for this. It seems you want to do a schema validation. Maybe this will help you: https://docs.oracle.com/javase/tutorial/jaxp/dom/validating.html Mind that I purposely do not write this as an answer, since it seems to be an X-Y-Problem and the suggestion does not answer your question but hopefully solve your problem. – Fildor Jan 09 '17 at 08:11
  • @Baby I disagree with the duplicate. He does not seem to try and read the XML or get values out of it but merely validate it. – Fildor Jan 09 '17 at 08:17
  • Thats correct, its not duplicate I assume. – Arefe Jan 09 '17 at 08:18
  • @Fildor Ah I see. I guess I didn't read the question details correctly. Sorry – Baby Jan 09 '17 at 08:19
  • @Fildor how about write it as an answer ? It will be everyone here. – Arefe Jan 09 '17 at 08:20
  • 2
    Can't resist linking to this answer : http://stackoverflow.com/a/1732454/13075 – Henrik Aasted Sørensen Jan 09 '17 at 08:20

1 Answers1

1

There is an infamous answer on using Regex for XML-Parsing, which I will not link (@Henrik did anyway ;P) or go into. But bottomline: Regex is very rarely a good idea to do XML validation (or parsing for that matter).

I suggest you go here: XML validation Oracle Docs

I guess it should be what you want. See, in Java you can use Schema-Validation to validate XML - which is what you want to do if I read the question correctly.

What you will have to do is to write a schema definition instead of a regex. This is not only the "correct and straight-forward" way to go, it will be much easier to maintain, too. It is no rocket science, neither and your schema seems to be pretty clear and rather easy to be condensed into an xsd. There are also tools which can help you do that. The outcome of those might still have to be fine-tuned, though.

Note: I know that "link-only" answers are discouraged on SO, but the resource is too big to be copied to the answer (at least IMHO). Also, there might be some copyright on behalf of Oracle. Since it is official Oracle Docs it should not be prone to "broken link" probably, too.

Fildor
  • 14,510
  • 4
  • 35
  • 67