3

In lxml's doc, it says:

lxml can interface to the parsing capabilities of BeautifulSoup through the lxml.html.soupparser module. It provides three main functions: fromstring() and parse() to parse a string or file using BeautifulSoup into an lxml.html document, and convert_tree() to convert an existing BeautifulSoup tree into a list of top-level Elements.

Meanwhile, BS' can also use lxml as the parser.[ref]

Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the lxml parser.

BS also suggests to use lxml as the parser for speed.

So what if lxml uses BS for parsing when BS's parser is lxml conversely?

I have been scratching my head over understanding their relationship. Help.

nn0p
  • 1,189
  • 12
  • 28
  • possible duplicate of [Parsing HTML in python - lxml or BeautifulSoup? Which of these is better for what kinds of purposes?](http://stackoverflow.com/questions/1922032/parsing-html-in-python-lxml-or-beautifulsoup-which-of-these-is-better-for-wha) – Anthony Geoghegan Aug 25 '15 at 08:49
  • I upvoted your question as I thought it was a good one. Then I noticed the top-listed Related question and followed the link. I thought the answer at http://stackoverflow.com/a/19549530/1640661 was excellent and helps clarify the relationship between the functionality of these two libraries. – Anthony Geoghegan Aug 25 '15 at 08:51

1 Answers1

6

Nothing should be confusing about BS parser and lxml.html parser. BS has an HTML parser, and lxml has its own HTML parser.

BS documentation you quoted simply says that you can parse HTML into BS soup object using lxml parser or other possible third-party parsers, as alternative to using the default BS parser :

BeautifulSoup(markup, "lxml")

Similarly, the lxml documentation says that you can parse HTML into lxml tree object using BS parser, as alternative to using the default lxml.html parser :

root = lxml.html.soupparser.fromstring(tag_soup)
har07
  • 88,338
  • 12
  • 84
  • 137