html5lib is a library for parsing and serializing HTML documents and fragments in Python, with ports to Dart, PHP, and Ruby.
html5lib is an open-source HTML parser for Python, based on the HTML specification. There are ports for PHP and Ruby (both unmaintained), as well as a third-party one for Dart.