1

Possible Duplicate:
How to check for valid xml in string input before calling .LoadXml()

In a program I am implementing, user can input a string. I have to identify that if it is a valid xml then I have to treat it differently. Is there a regex that can find if a given string is a valid xml?

Community
  • 1
  • 1
InfoLearner
  • 14,952
  • 20
  • 76
  • 124
  • 4
    Regex is the *completely* wrong tool for that. It's like trying to build a skyscraper with a toothbrush. Even assuming you mean "well-formed", and not "valid" in the XML spec sense (i.e. both well-formed and conforming to a schema). –  Nov 21 '12 at 16:33
  • 3
    Why not try loading it into an XmlDocument or XDocument and looking for parse exceptions? See http://stackoverflow.com/questions/86292/how-to-check-for-valid-xml-in-string-input-before-calling-loadxml or http://stackoverflow.com/questions/1026247/check-well-formed-xml-without-a-try-catch or http://stackoverflow.com/questions/1490053/how-to-tell-if-a-string-is-xml – dash Nov 21 '12 at 16:32
  • Maybe [this](http://codereview.stackexchange.com/questions/15100/checking-and-returning-xml-from-a-byte-array) can help! – Steve B Nov 21 '12 at 16:36
  • I think you ought to read http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Chris Martin Nov 21 '12 at 16:38

1 Answers1

3

It is not possible to construct a "plain" regex to recognize strings with balanced parentheses from strings with unbalanced parentheses, because the underlying programmable system of regex is not powerful enough to do it. For the same reason, it is theoretically impossible to construct a regular expression that could recognize a well-formed XML*.

Even if it were possible theoretically to construct such expression, it would not be practical: the amount of work needed to build a recognizer would be roughly the same as the amount of work needed to build a full-fledged XML parser.

Luckily, .NET comes with several XML parsers already. A piece of code like this

bool TryGetValidXml(string s, out XDoxument res) {
    try {
        res = XDocument.Load(s);
        return true;
    } catch {
        res = null;
        return false;
    }
}

would do the trick.


* There are various extensions, such as Perl's recursive regex and .NET balancing groups that let you work around this limitation.
Community
  • 1
  • 1
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • To avoid double work, I would add a `out XDocument validDocument` as parameter, to allow the caller to directly get the document, instead of first check, then parse, which would result in twice the parsing work – Steve B Nov 21 '12 at 17:00
  • @SteveB You are right, this is a good idea. The OP is going to need that XML anyway, so there's no reason to throw away the parsed document. – Sergey Kalinichenko Nov 21 '12 at 17:04
  • I've submitted a [code review](http://codereview.stackexchange.com/questions/15100/checking-and-returning-xml-from-a-byte-array) in order to speed up the process, by having some preflight chekcs before the actual parsing. – Steve B Nov 21 '12 at 17:14