1

For the purpose of education, I would like to find some way of detecting poorly formed html. I thought maybe one of the well known packages might have some kind of strict parsing flag, but I am not finding that in searches. From what I have found online so far, BeautifulSoup,

HTMLparser (https://docs.python.org/2/library/htmlparser.html#module-HTMLParser)

and others mentioned here https://stackoverflow.com/a/2680724/1339950

do not seem to have a way to do this.

It seems all the answers on SO and off are about how to tolerate malformed html. I also looked in BeautifulSoup to see if it had some kind of strict=True flag, but I don't see anything like that.

My Question: Can anyone advise me on an approach to detecting poorly formed html for educational purposes? A Python solution is strongly preferred because it needs to be integrated into an existing automated system, but other languages, frameworks, or applications will do if there is nothing in Python.

Community
  • 1
  • 1
philologon
  • 2,093
  • 4
  • 19
  • 35

0 Answers0