Using the Python Documentation I found the HTML parser but I have no idea which library to import to use it, how do I find this out (bearing in mind it doesn't say on the page).
8 Answers
You probably really want BeautifulSoup, check the link for an example.
But in any case
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> h.feed('<html></html>')
>>> h.get_starttag_text()
'<html>'
>>> h.close()

- 1
- 1

- 330,807
- 53
- 334
- 373
Try:
import HTMLParser
In Python 3.0, the HTMLParser module has been renamed to html.parser you can check about this here
Python 3.0
import html.parser
Python 2.2 and above
import HTMLParser
-
`https://docs.python.org/2/library/htmlparser.html` – noobninja May 18 '16 at 00:41
I would recommend using Beautiful Soup module instead and it has good documentation.

- 16,902
- 10
- 43
- 50
You should also look at html5lib for Python as it tries to parse HTML in a way that very much resembles what web browsers do, especially when dealing with invalid HTML (which is more than 90% of today's web).

- 20,270
- 7
- 50
- 76

- 1,792
- 9
- 17
You may be interested in lxml. It is a separate package and has C components, but is the fastest. It has also very nice API, allowing you to easily list links in HTML documents, or list forms, sanitize HTML, and more. It also has capabilities to parse not well-formed HTML (it's configurable).

- 18,074
- 9
- 49
- 65
I don't recommend BeautifulSoup if you want speed. lxml is much, much faster, and you can fall back in lxml's BS soupparser if the default parser doesn't work.

- 1,318
- 1
- 10
- 9
-
I agree, BeautifulSoup is only useful when parsing a handful of files, there are too many memoryleaks. – DrDee Sep 16 '10 at 15:51
There's a link to an example on the bottom of (http://docs.python.org/2/library/htmlparser.html) , it just doesn't work with the original python or python3. It has to be python2 as it says on the top.

- 146,994
- 96
- 417
- 335

- 655
- 3
- 15
For real world HTML processing I'd recommend BeautifulSoup. It is great and takes away much of the pain. Installation is easy.

- 9,918
- 2
- 22
- 18