bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Question

...
soup = BeautifulSoup(html, "lxml")
File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

The above outputs on my Terminal. I am on Mac OS 10.7.x. I have Python 2.7.1, and followed this tutorial to get Beautiful Soup and lxml, which both installed successfully and work with a separate test file located here. In the Python script that causes this error, I have included this line: from pageCrawler import comparePages And in the pageCrawler file I have included the following two lines: from bs4 import BeautifulSoup from urllib2 import urlopen

How can this problem be solved?

see this answer - http://stackoverflow.com/questions/17766725/how-to-re-install-lxml — Md. Mohsin, Nov 01 '14 at 08:34

score 446 · Answer 1 · edited May 06 '18 at 13:31

446

I have a suspicion that this is related to the parser that BS will use to read the HTML. They document is here, but if you're like me (on OSX) you might be stuck with something that requires a bit of work:

You'll notice that in the BS4 documentation page above, they point out that by default BS4 will use the Python built-in HTML parser. Assuming you are in OSX, the Apple-bundled version of Python is 2.7.2 which is not lenient for character formatting. I hit this same problem, so I upgraded my version of Python to work around it. Doing this in a virtualenv will minimize disruption to other projects.

If doing that sounds like a pain, you can switch over to the LXML parser:

pip install lxml

And then try:

soup = BeautifulSoup(html, "lxml")

Depending on your scenario, that might be good enough. I found this annoying enough to warrant upgrading my version of Python. Using virtualenv, you can migrate your packages fairly easily.

edited May 06 '18 at 13:31

user124384

400
1
9
22

answered Nov 11 '14 at 03:16

James Errico

5,876
1
20
16

4

To test after pip install : `python -c 'import requests ; from bs4 import BeautifulSoup ; r = requests.get("https://www.allrecipes.com/recipes/96/salad/") ; soup = BeautifulSoup(r.text, "lxml") '` – ViFI Mar 02 '19 at 19:40
3

in my virtual env, I needed to install `requests`, `bs4` and `lxml` before `BeautifulSoup` would parse my webpage content. – noobninja Nov 25 '19 at 19:22
2

Uff! Mad Mac, I dont know when I'll stop regretting my decision of buying Mac! – Iqra. May 04 '20 at 23:45
4

The first time I had to run lxml I added the line `import lxml` into my script then it ran – TobyPython Feb 10 '21 at 19:22

Ernst · Answer 2 · 2021-01-08T12:59:04.400

108

I'd prefer the built in python html parser, no install no dependencies

soup = BeautifulSoup(s, "html.parser")

edited Jan 08 '21 at 12:59

answered May 10 '17 at 08:55

Ernst

1,125
1
7
4

1

Although this answer doesn't answer question directly, it does provide potentially a better alternative. I had no preference for xlml and i changed everything to html.parser and it worked. I'd rather carry forward with something that works out of the box , than drag on the unnecessary technical debt. – donkz Mar 25 '21 at 14:48
Sometimes the html parser doesn't do the job. Some page requires the XML parser to do the job. – Luís Henrique Martins Mar 17 '22 at 15:53

score 64 · Answer 3 · answered Feb 10 '17 at 04:24

64

For basic out of the box python with bs4 installed then you can process your xml with

soup = BeautifulSoup(html, "html5lib")

If however you want to use formatter='xml' then you need to

pip3 install lxml

soup = BeautifulSoup(html, features="xml")

answered Feb 10 '17 at 04:24

Tim Seed

5,119
2
30
26

7

On a newly spun up remote server, html5lib didn't work out of the box for me. I still had to do a ```pip install html5lib```, after which everything worked fine. – petercoles Dec 14 '19 at 14:00
2

Didn't work for me: `bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?` If I change it to `html.parser` it works – 8bitjunkie May 22 '20 at 20:29

Pikamander2 · Answer 4 · 2020-05-25T03:17:39.443

52

Run these three commands to make sure that you have all the relevant packages installed:

pip install bs4
pip install html5lib
pip install lxml

Then restart your Python IDE, if needed.

That should take care of anything related to this issue.

edited May 25 '20 at 03:17

answered Feb 12 '20 at 08:22

Pikamander2

7,332
3
48
69

8

This is the actual solution. – John Stud Jun 10 '20 at 00:13
4

The key for me on this was restarting the IDE that triggered this all to work successfully – KVSEA Jun 26 '22 at 04:20

score 45 · Answer 5 · edited Jul 02 '22 at 17:02

45

Actually 3 of the options mentioned by other work.

# 1. 
soup_object= BeautifulSoup(markup,"html.parser") #Python HTML parser

# 2. 
pip install lxml
soup_object= BeautifulSoup(markup,'lxml') # C dependent parser 

# 3.
pip install html5lib
soup_object= BeautifulSoup(markup,'html5lib') # C dependent parser

edited Jul 02 '22 at 17:02

JayRizzo

3,234
3
33
49

answered Sep 01 '20 at 20:14

33Anika33

571
4
2

In my case `lxml` used to work but when I switched to `html.parser` it froze. – Yan King Yin Dec 13 '21 at 18:01

score 19 · Answer 6 · edited Jan 22 '18 at 07:33

19

I am using Python 3.6 and I had the same original error in this post. After I ran the command:

python3 -m pip install lxml

it resolved my problem

edited Jan 22 '18 at 07:33

Kinght 金

17,681
4
60
74

answered Jan 22 '18 at 04:48

Bashar

191
1
2

4

In Docker it's also necessary to `apt install python-lxml` – Oct 30 '19 at 12:41
I don't need to run `apt install python-lxml`, but perhaps this is image-dependent. It suffices for me to do `python3 -m pip install lxml`. – Jan-Åke Larsson Sep 09 '22 at 09:03

score 18 · Answer 7 · answered May 28 '20 at 12:00

Install LXML parser in python environment.

pip install lxml

Your problem will be resolve. You can also use built-in python package for the same as:

soup = BeautifulSoup(s,  "html.parser")

Note: The "HTMLParser" module has been renamed to "html.parser" in Python3

score 13 · Answer 8 · answered Feb 13 '18 at 12:28

13

Instead of using lxml use html.parser, you can use this piece of code:

soup = BeautifulSoup(html, 'html.parser')

answered Feb 13 '18 at 12:28

Yogesh

1,384
1
12
16

2

`vendor.bs.bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html.parser. Do you need to install a parser library?` – alex Apr 18 '18 at 17:27

score 8 · Answer 9 · edited Jan 03 '22 at 11:42

Although BeautifulSoup supports the HTML parser by default If you want to use any other third-party Python parsers you need to install that external parser like(lxml).

soup_object= BeautifulSoup(markup, "html.parser") #Python HTML parser

But if you don't specified any parser as parameter you will get an warning that no parser specified.

soup_object= BeautifulSoup(markup) #Warnning

To use any other external parser you need to install it and then need to specify it. like

pip install lxml

soup_object= BeautifulSoup(markup, 'lxml') # C dependent parser

External parser have c and python dependency which may have some advantage and disadvantage.

score 5 · Answer 10 · answered Feb 17 '22 at 03:25

5

In my case I had an outdated version of the lxml package. So I just updated it and this fixed the issue.

sudo python3 -m pip install lxml --upgrade

answered Feb 17 '22 at 03:25

blizz

4,102
6
36
60

1

thank you! this is what I needed to do also – Luther Aug 03 '22 at 06:53

score 4 · Answer 11 · answered Dec 29 '22 at 20:41

4

pip install lxml then keeping xml in soup = BeautifulSoup(URL, "xml") did the job on Mac.

answered Dec 29 '22 at 20:41

zabop

6,750
3
39
84

score 3 · Answer 12 · answered Mar 04 '17 at 06:17

I encountered the same issue. I found the reason is that I had a slightly-outdated python six package.

>>> import html5lib
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/html5lib/__init__.py", line 16, in <module>
    from .html5parser import HTMLParser, parse, parseFragment
  File "/usr/local/lib/python2.7/site-packages/html5lib/html5parser.py", line 2, in <module>
    from six import with_metaclass, viewkeys, PY3
ImportError: cannot import name viewkeys

Upgrading your six package will solve the issue:

sudo pip install six=1.10.0

score 2 · Answer 13 · answered Jul 03 '22 at 04:41

2

BS4 by default expects an HTML document. Therefore, it parses an XML document as an HTML one. Pass features="xml" as an argument in the constructor. It resolved my issue.

answered Jul 03 '22 at 04:41

ayan-cs

41
6

5

you need to install `lxml`, with `pip install lxml` – titusfx Aug 13 '22 at 16:38

score 1 · Answer 14 · edited Apr 02 '18 at 14:07

1

In some references, use the second instead of the first:

soup_object= BeautifulSoup(markup,'html-parser')
soup_object= BeautifulSoup(markup,'html.parser')

edited Apr 02 '18 at 14:07

nj2237

1,220
3
21
25

answered Apr 02 '18 at 13:28

abhishekPakrashi

23
4

You should provide a bit more detail in your answer – Michael Apr 02 '18 at 13:50

score 1 · Answer 15 · answered Jan 24 '20 at 03:07

The error is coming because of the parser you are using. In general, if you have HTML file/code then you need to use html5lib(documentation can be found here) & in-case you have XML file/data then you need to use lxml(documentation can be found here). You can use lxml for HTML file/code also but sometimes it gives an error as above. So, better to choose the package wisely based on the type of data/file. You can also use html_parser which is built-in module. But, this also sometimes do not work.

For more details regarding when to use which package you can see the details here

user176105 · Answer 16 · 2020-03-30T21:11:48.140

Blank parameter will result in a warning for best available.
soup = BeautifulSoup(html)

---------------/UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html5lib"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.----------------------/

python --version Python 3.7.7

PyCharm 19.3.4 CE

score 1 · Answer 17 · answered Nov 09 '21 at 19:47

1

My solution was to remove lxml from conda and reinstalling it with pip.

answered Nov 09 '21 at 19:47

MJimitater

833
3
13
26

score 1 · Answer 18 · answered Jan 17 '22 at 20:42

I am using python 3.8 in pycharm. I assume that you had not installed "lxml" before you started working. This is what I did:

Go to File -> Settings
Select " Python Interpreter " on the left menu bar of settings, select "Python Interpreter."
Click the "+" icon over the list of packages.
Search for "lxml."
Click "Install Package" on the bottom left of the "Available Package" window.

Shivam Baldha · Answer 19 · 2022-03-11T03:46:31.593

1

I fixed with below changes

Before changes

soup = BeautifulSoup(r.content, 'html5lib' )
print (soup.prettify())

After change

soup = BeautifulSoup(r.content, features='html')
print(soup.prettify())

my code works properly

edited Mar 11 '22 at 03:46

answered Mar 06 '22 at 14:00

Shivam Baldha

48
8

Are you sure of the syntax? The string in the second block of code doesn't seem to be valid Python syntax – aaossa Mar 10 '22 at 02:48
Nowi it is work – Shivam Baldha Mar 11 '22 at 04:14

score 0 · Answer 20 · answered Feb 27 '22 at 17:07

0

This method worked for me. I prefer to mention that I was trying this in the virtual environment. First:

pip install --upgrade bs4

Secondly, I used:

html.parser

instead of

html5lib

answered Feb 27 '22 at 17:07

abbas abaei

65
3
3
8

score 0 · Answer 21 · answered Jul 30 '22 at 23:21

0

You may want to double check that you're using the right interpreter if you have multiple versions of Python installed.

Once I chose the correct version of Python, lxml was found.

answered Jul 30 '22 at 23:21

Akira Rorschach

55
8

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

21 Answers21

Linked

Related