-4

Possible Duplicate:
regular expression to check if string is valid XML

I am looking Regular Expression to check String is Valid XHTML or not

example

<h2>Legal HTML Entity References</h2><table align="center" border="0" ><tr></tr></table>
Community
  • 1
  • 1
Manish
  • 25
  • 1
  • 6
  • 1
    You really need read something about Chomsky language hierarchy and formal grammars. You can check only regular language with regular expression and the XHTML is not regular language. [1]: http://en.wikipedia.org/wiki/Formal_grammar#The_Chomsky_hierarchy – Robert Balent Sep 27 '11 at 07:48

4 Answers4

5

This sounds like a bad idea: The language of valid XHTML strings is not regular.

Use an HTML parsing library instead. A few examples:


Related question:

Community
  • 1
  • 1
aioobe
  • 413,195
  • 112
  • 811
  • 826
1

Regex is exactly the wrong tool to use.

HTML is not a regular language and hence cannot be parsed by regular expressions.

See Jeff's post on the subject here: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

Since you've tagged this post Java, you should look at using one of the myriad of HTML parsing libraries available.

Paolo
  • 22,188
  • 6
  • 42
  • 49
1

Have a look here why parsing HTML using regular expressions won't work reliably: RegEx match open tags except XHTML self-contained tags

XHTML is just another flavor/superset of HTML, so you're better of using a real validator, like JTidy etc.

Community
  • 1
  • 1
Thomas
  • 87,414
  • 12
  • 119
  • 157
0

Try to check it with a parser. Don't do it the Cthulhu Way.

Here you can find a strating point and some examples on how to do it: The Java XML Validation API

Xavi López
  • 27,550
  • 11
  • 97
  • 161