1

Is there a regex for checking if the xml is well formed ?

Thanks

Edit: If not regex, then is there a good parsing method that i can use in c# that doesnt throw exception. I tried using xmlReader but it didnt work for me.

devforall
  • 7,217
  • 12
  • 36
  • 42
  • What are you trying to accomplish? – Jim Garrison Nov 17 '09 at 23:06
  • 2
    http://www.codinghorror.com/blog/archives/001311.html – Harold L Nov 17 '09 at 23:06
  • Disclaimer: SO isn't usually this fast. It's just that the topic has come up a lot recently, and it was even mentioned in the podcast. – Stefan Kendall Nov 17 '09 at 23:13
  • Jim Garrison - if I had to guess, I'd say the goal is to check an XMl doc or fragment for wel-formedness. ?? In which case, there's usually a class (Depending on the programming framework used) that represents an XmlDocument. Whether it's Javascript hosted in the browser, or Java, or C#, or PHP - there are classes that represent XML Documents that can do this - check for well-formedness. Often they do it implicitly. A regex is not the right tool for the job. – Cheeso Nov 17 '09 at 23:20
  • we get an xml through rest based web service. I just want to check before processing everytime if xml recieved is really an xml and not html or text. – devforall Nov 17 '09 at 23:43
  • Define "it didn't work for me". – Marc Gravell Nov 18 '09 at 01:25

9 Answers9

7

This is well beyond the capabilities of regular expressions. In other words, the answer is that it's not possible.

EDIT: There are plenty of tools available to check well-formedness, but they all involve some sort of XML parser/validator. If you provide more information about your environment maybe we can point you in the right direction.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • You're answering the question as stated, but maybe not providing the information sought. A better answer might be, "use an XmlDocument object or similar to verify well-formedness." Sometimes people don't know the right tool for the job, though they know the job they want to do. – Cheeso Nov 17 '09 at 23:18
  • @Cheeso - You are right, I could have been a little more helpful. I've edited the post. Thx. – Jim Garrison Nov 17 '09 at 23:25
6

No.

XML syntax is irregular enough to give any regular expression nightmares.

You're not the first to ask this, but don't feel bad because the question about parsing HTML and XML with regular expressions will keep being asked because regular expressions look perfect for the job but they aren't sadly.

XML syntax is complex enough that you can't safely parse it with a regex. It looks simple and regular but there's plenty of scope for causing problems. One nasty CDATA section and things get very hard. And consider the RSS feeds where you get HTML embedded in the XML.

So please use an XML parsing library for this. There are plenty of them.

If you want more detail have a look at this question which gives some examples of the horror syntax you can meet and this question which shows what happens if do try to parse these things with Regular Expressions.

Community
  • 1
  • 1
David Webb
  • 190,537
  • 57
  • 313
  • 299
  • Dave, XML doesn't look regular in the sense of regular expressions. Please look up regular grammars/languages in Wikipedia. – Svante Nov 17 '09 at 23:50
2

There is no regex solution, because Jeff told me so.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Stefan Kendall
  • 66,414
  • 68
  • 253
  • 406
2

If not regex, then is there a good parsing method that i can use in c# that doesnt throw exception. I tried using xmlReader but it didnt work for me.

Using XmlReader and while(reader.Read()) {} (catching any exception) is probably the fastest pure managed approach.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
1

No, there is not. (Practically speaking and for the general case, at least.) Use a validating parser if you want to determine whether or not XML is well-formed.

Corey Porter
  • 1,309
  • 8
  • 9
1

Use a XML validator instead.

Percutio
  • 979
  • 7
  • 8
1

No, if recursive regexps are not considered. Regexps can't check arbitratry nesting. However, some regexp engines accept recursive regexps which you may try using for this purpose.

Dmitry
  • 3,740
  • 15
  • 17
0

recent versions of PCRE have all kinds of features which would make this achievable, but the code would be ugly as hell. libxml2 comes with xmllint, why not use the right tool for the job?

just somebody
  • 18,602
  • 6
  • 51
  • 60
0

I'm making an assumption here. You think that using a library will be too slow or too heavyweight to do this quickly and/or efficiently.

If this is the case then test it out. Try a few libraries, see how big they are, see how fast they are.

Fortyrunner
  • 12,702
  • 4
  • 31
  • 54