I'm looking for the best way to convert HTML to text, using only modules from the Python 2.7.x standard library. (I.e., no BeautifulSoup
, etc.)
By HTML-to-text conversion I mean the moral equivalent of lynx -dump
. In fact, just getting rid of HTML tags intelligently, and converting all HTML-entities to ASCII (or to UTF8-encoded unicode), would suffice.
No regex-based answers, please. (Regexes are not up to the task.)
Thanks!