2

How can I check the validity of HTML code with Python?

I need closed tags check, and braces in tags params. Such as |a href="xxx'| and other possible validations, which libs I can use for this?

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
Evg
  • 2,978
  • 5
  • 43
  • 58

2 Answers2

3

Well, this isn't exactly what you're looking for, but to validate the HTML for a web site I work on, I ask the W3C Validator to check it for me, and I just screen scrape the output to get the basic yes/no result. Note there are several validation services on the web as alternatives, but W3C works well enough for me.

#!/usr/bin/python2.6
import re
import urllib
import urllib2

def validate(URL):
    validatorURL = "http://validator.w3.org/check?uri=" + \
        urllib.quote_plus(URL)
    opener = urllib2.urlopen(validatorURL)
    output = opener.read()
    opener.close()
    if re.search("This document was successfully checked as".replace(
            " ", r"\s+"), output):
        print "    VALID: ", URL
    else:
        print "INVALID: ", URL
Peter Lyons
  • 142,938
  • 30
  • 279
  • 274
2

The html5lib module can be used to perform basic HTML validation:

>>> import html5lib
>>> html5parser = html5lib.HTMLParser(strict=True)
>>> html5parser.parse('<html></html>')
Traceback (most recent call last):
  ...
html5lib.html5parser.ParseError: Unexpected start tag (html). Expected DOCTYPE.
>>> html5parser.parseFragment('<p>Lorem <a href="/foobar">ipsum</a>')
<Element 'DOCUMENT_FRAGMENT' at 0x7f1d4a58fd60>
>>> html5parser.parseFragment('<p>Lorem </a>ipsum<a href="/foobar">')
Traceback (most recent call last):
  ...
html5lib.html5parser.ParseError: Unexpected end tag (a). Ignored.
>>> html5parser.parseFragment('<p><form></form></p>')
Traceback (most recent call last):
  ...
html5lib.html5parser.ParseError: Unexpected end tag (p). Ignored.
>>> html5parser.parseFragment('<option value="example" />')
Traceback (most recent call last):
  ...
html5lib.html5parser.ParseError: Trailing solidus not allowed on element option
Changaco
  • 790
  • 5
  • 12