Questions tagged [lxml]

lxml is a full-featured, high performance Python library for processing XML and HTML.

Questions that concern the lxml Python library should have this tag. Per the XML website, "The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt." The library's lxml.etree package is used for XML processing. lxml's BeautifulSoup package parses broken HTML. html5lib uses the HTML5 parsing algorithm.

Links:

https://lxml.de/ - Contains API documentation and tutorials

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/ - IBM developerWorks page on lxml

5412 questions
454
votes
12 answers

How to install lxml on Ubuntu

I'm having difficulty installing lxml with easy_install on Ubuntu 11. When I type $ easy_install lxml I get: Searching for lxml Reading http://pypi.python.org/simple/lxml/ Reading http://codespeak.net/lxml Best match: lxml 2.3 Downloading…
Eric Wilson
  • 57,719
  • 77
  • 200
  • 270
424
votes
21 answers

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

... soup = BeautifulSoup(html, "lxml") File "/Library/Python/2.7/site-packages/bs4/__init__.py", line 152, in __init__ % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to…
user3773048
  • 5,839
  • 4
  • 18
  • 22
311
votes
28 answers

libxml install error using pip

This is my error: (mysite)zjm1126@zjm1126-G41MT-S2:~/zjm_test/mysite$ pip install lxml Downloading/unpacking lxml Running setup.py egg_info for package lxml Building lxml version 2.3. Building without Cython. ERROR: /bin/sh:…
zjm1126
  • 34,604
  • 53
  • 121
  • 166
289
votes
3 answers

builtins.TypeError: must be str, not bytes

I've converted my scripts from Python 2.7 to 3.2, and I have a bug. # -*- coding: utf-8 -*- import time from datetime import date from lxml import etree from collections import OrderedDict # Create the root element page =…
user278618
  • 19,306
  • 42
  • 126
  • 196
245
votes
24 answers

Cannot install Lxml on Mac OS X 10.9

I want to install Lxml so I can then install Scrapy. When I updated my Mac today it wouldn't let me reinstall lxml, I get the following error: In file included from…
David O'Regan
  • 2,684
  • 2
  • 13
  • 12
127
votes
3 answers

How to select following sibling/XML tag using XPath

I have an HTML file (from Newegg) and their HTML is organized like below. All of the data in their specifications table is 'desc' while the titles of each section are in 'name.' Below are two examples of data from Newegg pages.
Corey Farwell
  • 1,856
  • 3
  • 14
  • 19
117
votes
1 answer

SyntaxError of Non-ASCII character

I am trying to parse xml which contains the some non ASCII cheracter, the code looks like below from lxml import etree from lxml import objectify content = u'
Order date                            :…
OpenCurious
  • 2,916
  • 5
  • 22
  • 25
108
votes
5 answers

src/lxml/etree_defs.h:9:31: fatal error: libxml/xmlversion.h: No such file or directory

I am running the following comand for installing the packages in that file " pip install -r requirements.txt --download-cache=~/tmp/pip-cache". requirement.txt contains pacakages like # Data formats # ------------ PIL==1.1.7 #…
user2086641
  • 4,331
  • 13
  • 56
  • 96
107
votes
3 answers

pip is not able to install packages correctly: Permission denied error

I am trying to install lxml to install scrapy on my Mac (v 10.9.4) ╭─ishaantaylor@Ishaans-MacBook-Pro.local ~ ╰─➤ pip install lxml Downloading/unpacking lxml Downloading lxml-3.4.0.tar.gz (3.5MB): 3.5MB downloaded Running setup.py…
Ishaan Taylor
  • 1,817
  • 5
  • 17
  • 19
106
votes
7 answers

Installing lxml module in python

while running a python script, I got this error from lxml import etree ImportError: No module named lxml now I tried to install lxml sudo easy_install lmxl but it gives me the following error Building lxml version 2.3.beta1. NOTE: Trying to…
user563101
  • 1,155
  • 2
  • 7
  • 5
106
votes
2 answers

Flask example with POST

Suppose the following route which accesses an xml file to replace the text of a specific tag with a given xpath (?key=): @app.route('/resource', methods = ['POST']) def update_text(): # CODE Then, I would use cURL like this: curl -X POST…
bulkmoustache
  • 1,875
  • 3
  • 20
  • 24
104
votes
10 answers

How to Pretty Print HTML to a file, with indentation

I am using lxml.html to generate some HTML. I want to pretty print (with indentation) my final result into an html file. How do I do that? This is what I have tried and got till now import lxml.html as lh from lxml.html import builder as…
bcosynot
  • 5,653
  • 10
  • 34
  • 47
104
votes
15 answers

Get all text inside a tag in lxml

I'd like to write a code snippet that would grab all of the text inside the tag, in lxml, in all three instances below, including the code tags. I've tried tostring(getchildren()) but that would miss the text in between the tags. I didn't…
Kevin Burke
  • 61,194
  • 76
  • 188
  • 305
96
votes
6 answers

how to remove an element in lxml

I need to completely remove elements, based on the contents of an attribute, using python's lxml. Example: import lxml.etree as et xml=""" apple pear
ewok
  • 20,148
  • 51
  • 149
  • 254
71
votes
5 answers

lxml installation error ubuntu 14.04 (internal compiler error)

I am having problems with installing lxml. I have tried the solutions of the relative questions in this site and other sites but could not fix the problem. Need some suggestions/solution on this. I am providing the full log after executing pip…
salmanwahed
  • 9,450
  • 7
  • 32
  • 55
1
2 3
99 100