22

Python version and Device used

  • Python 2,7.5
  • Mac 10.7.5
  • BeautifulSoup 4.2.1.

I'm following the BeautifulSoup tutorial but when I try to parse a xml page using the lxml library I get the following error:

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested:
lxml,xml. Do you need to install a parser library?

I am sure that I already installed lxml by all methods: easy_install, pip, port, etc. I tried to add a line to my code to see if lxml is installed or not:

import lxml

Then python can just successfully go through this code and display the previous error message again, occurring at the same line.

So I am quite sure that lxml was installed, but not installed correctly. So I decided to uninstall lxml, and then re-install using a 'correct' method. But when I type in

easy_install -m  lxml

I get the following error:

Searching for lxml
Best match: lxml 3.2.1
Processing lxml-3.2.1-py2.7-macosx-10.6-intel.egg

Using /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/lxml-
3.2.1-py2.7-macosx-10.6-intel.egg

Because this distribution was installed --multi-version, before you can
import modules from this package in an application, you will need to
'import pkg_resources' and then use a 'require()' call similar to one of
these examples, in order to select the desired version:

pkg_resources.require("lxml")  # latest installed version
pkg_resources.require("lxml==3.2.1")  # this exact version
pkg_resources.require("lxml>=3.2.1")  # this version or higher

Processing dependencies for lxml
Finished processing dependencies for lxml

So I don't know how to continue my uninstall, I looked up many posts about this issue on google but still I can't find any useful info.


Here is my Source code

import mechanize
from bs4 import BeautifulSoup
import lxml

class count:
    def __init__(self,protein):
        self.proteinCode = protein
        self.br = mechanize.Browser()

    def first_search(self):
        #Test 0
        soup = BeautifulSoup(self.br.open("http://www.ncbi.nlm.nih.gov/protein/21225921?report=genbank&log$=prottop&blast_rank=1&RID=YGJHMSET015"), ['lxml','xml'])
        return

if __name__=='__main__':
    proteinCode = sys.argv[1]
    gogogo = count(proteinCode)

Questions

  1. How can I uninstall lxml?
  2. How can I install lxml 'correctly'? How do I know that it is correctly installed?
Anatoly
  • 20,799
  • 3
  • 28
  • 42
Mark23333
  • 321
  • 1
  • 3
  • 6

4 Answers4

20

I am using BeautifulSoup 4.3.2 and OS X 10.6.8. I also have a problem with improperly installed lxml. Here are some things that I found out:

First of all, check this related question: Removed MacPorts, now Python is broken

Now, in order to check which builders for BeautifulSoup 4 are installed, try

>>> import bs4
>>> bs4.builder.builder_registry.builders

If you don't see your favorite builder, then it is not installed, and you will see an error as above ("Couldn't find a tree builder...").

Also, just because you can import lxml, doesn't mean that everything is perfect.

Try

>>> import lxml
>>> import lxml.etree

To understand what's going on, go to the bs4 installation and open the egg (tar -xvzf). Notice the modules bs4.builder. Inside it you should see files such as _lxml.py and _html5lib.py. So you can also try

>>> import bs4.builder.htmlparser
>>> import bs4.builder._lxml
>>> import bs4.builder._html5lib

If there is a problem, you will see, why a parricular module cannot be loaded. You can notice how at the end of builder/__init__.py it loads all those modules and ignores whatever was not loaded:

# Builders are registered in reverse order of priority, so that custom
# builder registrations will take precedence. In general, we want lxml
# to take precedence over html5lib, because it's faster. And we only
# want to use HTMLParser as a last result.
from . import _htmlparser
register_treebuilders_from(_htmlparser)
try:
    from . import _html5lib
    register_treebuilders_from(_html5lib)
except ImportError:
    # They don't have html5lib installed.
    pass
try:
    from . import _lxml
    register_treebuilders_from(_lxml)
except ImportError:
    # They don't have lxml installed.
    pass
Community
  • 1
  • 1
Sergey Orshanskiy
  • 6,794
  • 1
  • 46
  • 50
  • 1
    The suggestion at the related question (http://stackoverflow.com/questions/14153221/removed-macports-now-python-is-broken) to uninstall and re-install resolved the issue for me. – D. Savitt Nov 30 '13 at 02:05
  • 3
    Since `lxml` was missing on my machine, performing `sudo pip install lxml` solved the issue for me. – Stefan Schmidt Jun 22 '14 at 18:04
  • 1
    In addition, this step may also be necessary when install lxml: http://stackoverflow.com/questions/19548011/cannot-install-lxml-on-mac-os-x-10-9 – taylorc93 Jul 22 '15 at 15:47
  • lxml was missing. post installation it worked :) Thankyou! – Rayon Aug 06 '20 at 15:11
6

If you are using Python2.7 in Ubuntu/Debian, this worked for me:

$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml 

Test it like:

mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
5

FWIW, I ran into a similar problem (python 3.6, os x 10.12.6) and was able to solve it simply by doing (first command is just to signify that I was working in a conda virtualenv):

$ source activate ml-general
$ pip uninstall lxml
$ pip install lxml

I tried more complicated things first, because BeautifulSoup was working correctly with an identical command through Jupyter+iPython, but not through PyCharm's terminal in the same virtualenv. Simply reinstalling lxml as above solved the problem.

basse
  • 1,088
  • 1
  • 19
  • 40
0

apt-get on Debian/Ubuntu: sudo apt-get install python3-lxml For MacOS-X, a macport of lxml is available. Try something like sudo port install py27-lxml

http://lxml.de/installation.html may be helpful.

Michael
  • 11
  • 1