How to validate html as well-formed instead of tolerate invalid html robustly?

Asked Apr 27 '16 at 22:17

Active Apr 27 '16 at 22:22

Viewed 269 times

For the purpose of education, I would like to find some way of detecting poorly formed html. I thought maybe one of the well known packages might have some kind of strict parsing flag, but I am not finding that in searches. From what I have found online so far, BeautifulSoup,

HTMLparser (https://docs.python.org/2/library/htmlparser.html#module-HTMLParser)

and others mentioned here https://stackoverflow.com/a/2680724/1339950

do not seem to have a way to do this.

It seems all the answers on SO and off are about how to tolerate malformed html. I also looked in BeautifulSoup to see if it had some kind of strict=True flag, but I don't see anything like that.

My Question: Can anyone advise me on an approach to detecting poorly formed html for educational purposes? A Python solution is strongly preferred because it needs to be integrated into an existing automated system, but other languages, frameworks, or applications will do if there is nothing in Python.

edited May 23 '17 at 12:07

Community

asked Apr 27 '16 at 22:17

philologon

2,093
4
19
35

1

I think here are some useful references: http://stackoverflow.com/questions/35538/validate-xhtml-in-python. – alecxe Apr 28 '16 at 02:03
Oh yeah. Your link looks awesome. Will pursue tomorrow. – philologon Apr 28 '16 at 02:16

How to validate html as well-formed instead of tolerate invalid html robustly?

0 Answers0