1

Possible Duplicate:
Recommend a HTML Validator in java
How to validate HTML from Java?

How would I be able to check if some string represented valid HTML code? (With being able to have whitespace before and after the HTML code)

For example, the string <html><body><h1>My First Heading</h1><p>My first paragraph.</p></body></html> would return true since it is valid HTML.

But, the following string <html><body><h1>p>My first paragraph.</p></body></html> would return false since it is not valid HTML.

Community
  • 1
  • 1
user906153
  • 1,218
  • 8
  • 30
  • 43
  • I beleive that this is very similar question with good answer http://stackoverflow.com/questions/4217801/recommend-a-html-validator-in-java – Fedor Skrynnikov Dec 19 '11 at 20:52
  • Ultimately this is almost impossible as you can get as detailed as which engine would render this and which wouldn't. You can validate vs regex if you'd like. You can also check to see if it's valid XML... but HTML != XML and unfortunately there are bad websites out there which render fine but are not valid XML. http://www.regular-expressions.info/examples.html – SQLMason Dec 19 '11 at 20:54

1 Answers1

0

Best to use an HTML parse, maybe JTidy would be a good fit.

duffymo
  • 305,152
  • 44
  • 369
  • 561