lxml is a full-featured, high performance Python library for processing XML and HTML.
Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.
Links:
https://lxml.de/ - Contains API documentation and tutorials
https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml