How to use XPath in Python?

Question

What are the libraries that support XPath? Is there a full implementation? How is the library used? Where is its website?

I have this sneaky suspicion that the answers to this question are a bit stale now. — Warren P, Nov 26 '12 at 20:44
The answer by @gringo-suave looks like a good update. http://stackoverflow.com/a/13504511/1450294 — Michael Scheper, Jun 18 '13 at 08:09
Scrapy offers [XPath selectors](https://docs.scrapy.org/en/latest/topics/selectors.html). — cs95, Apr 20 '19 at 20:04
As @WarrenP says, most of the answers here are extremely stale old Python-2.x, really out of date. Maybe this question should be tagged [tag:python-2.x] — smci, Jun 08 '20 at 10:00

score 136 · Answer 1 · edited Feb 07 '16 at 18:16

libxml2 has a number of advantages:

Compliance to the spec
Active development and a community participation
Speed. This is really a python wrapper around a C implementation.
Ubiquity. The libxml2 library is pervasive and thus well tested.

Downsides include:

Compliance to the spec. It's strict. Things like default namespace handling are easier in other libraries.
Use of native code. This can be a pain depending on your how your application is distributed / deployed. RPMs are available that ease some of this pain.
Manual resource handling. Note in the sample below the calls to freeDoc() and xpathFreeContext(). This is not very Pythonic.

If you are doing simple path selection, stick with ElementTree ( which is included in Python 2.5 ). If you need full spec compliance or raw speed and can cope with the distribution of native code, go with libxml2.

Sample of libxml2 XPath Use

import libxml2

doc = libxml2.parseFile("tst.xml")
ctxt = doc.xpathNewContext()
res = ctxt.xpathEval("//*")
if len(res) != 2:
    print "xpath query: wrong node set size"
    sys.exit(1)
if res[0].name != "doc" or res[1].name != "foo":
    print "xpath query: wrong node set value"
    sys.exit(1)
doc.freeDoc()
ctxt.xpathFreeContext()

Sample of ElementTree XPath Use

from elementtree.ElementTree import ElementTree
mydoc = ElementTree(file='tst.xml')
for e in mydoc.findall('/foo/bar'):
    print e.get('title').text

using python 2.7.10 on osx I had to import ElementTree as `from xml.etree.ElementTree import ElementTree` — Ben Page, Jan 11 '16 at 14:34
because it is a C wrapper you might find difficulty deploying it to AWS Lambda unless you compile on an EC2 instance or a Docker image of AWS Linux — CpILL, Jul 12 '18 at 13:51
libxml2 isn't built-in to Python, is it? How would one do this with the built-in `xml` libraries? — Eric Dand, Sep 29 '22 at 23:18

score 86 · Answer 2 · edited Jan 10 '13 at 19:26

86

The lxml package supports xpath. It seems to work pretty well, although I've had some trouble with the self:: axis. There's also Amara, but I haven't used it personally.

edited Jan 10 '13 at 19:26

MvG

57,380
22
148
276

answered Aug 12 '08 at 11:40

James Sulak

31,389
11
53
57

1

amara's pretty nice, and one doesn't always need xpath. – gatoatigrado Dec 05 '09 at 05:35
3

Please add some basic details about how to use XPath with lxml. – jpmc26 Oct 30 '18 at 17:57
There's a standard library solution. I prefer fewer dependencies. See Gringo Suave's answer. – Ted Shaneyfelt Mar 30 '21 at 05:29

Gringo Suave · Answer 3 · 2022-09-30T16:28:10.203

Sounds like an lxml advertisement in here. ;) ElementTree is included in the std library. Under 2.6 and below its xpath is pretty weak, but in 2.7+ and 3.x much improved:

import xml.etree.ElementTree as ET

root = ET.parse(filename)
result = ''

for elem in root.findall('.//child/grandchild'):
    # How to make decisions based on attributes:
    if elem.attrib.get('name') == 'foo':
        result = elem.text
        break

score 40 · Answer 4 · answered Nov 13 '09 at 23:11

Use LXML. LXML uses the full power of libxml2 and libxslt, but wraps them in more "Pythonic" bindings than the Python bindings that are native to those libraries. As such, it gets the full XPath 1.0 implementation. Native ElemenTree supports a limited subset of XPath, although it may be good enough for your needs.

score 32 · Answer 5 · edited Jan 27 '13 at 14:10

32

Another option is py-dom-xpath, it works seamlessly with minidom and is pure Python so works on appengine.

import xpath
xpath.find('//item', doc)

edited Jan 27 '13 at 14:10

answered Jan 23 '10 at 09:30

Sam

6,240
4
42
53

3

Easier than lxml and libxml2 if you're already working with minidom. Works beautifully and is more "Pythonic". The `context` in the `find` function let you use another xpath result as a new search context. – Ben Oct 24 '11 at 20:19
5

I too have been using py-dom-xpath as I write a plugin, because it's pure python. But I don't think it's maintained anymore, and be aware of this bug ("Cannot access an element whose name is 'text'"): https://code.google.com/p/py-dom-xpath/issues/detail?id=8 – Jon Coombs Feb 06 '14 at 20:52
3

[py-dom-xpath seems to have been mothballed years ago in 2010](https://code.google.com/archive/p/py-dom-xpath/), please at minimum edit this into your answer. – smci Jun 08 '20 at 10:03

score 15 · Answer 6 · answered Aug 23 '10 at 13:00

15

You can use:

PyXML:

from xml.dom.ext.reader import Sax2
from xml import xpath
doc = Sax2.FromXmlFile('foo.xml').documentElement
for url in xpath.Evaluate('//@Url', doc):
  print url.value

libxml2:

import libxml2
doc = libxml2.parseFile('foo.xml')
for url in doc.xpathEval('//@Url'):
  print url.content

answered Aug 23 '10 at 13:00

0xAX

20,957
26
117
206

1

when I try the PyXML code, I got `ImportError: No module named ext` from `from xml.dom.ext.reader import Sax2` – Aminah Nuraini Nov 30 '15 at 19:24

Aminah Nuraini · Answer 7 · 2016-04-25T01:59:42.000

11

You can use the simple soupparser from lxml

Example:

from lxml.html.soupparser import fromstring

tree = fromstring("<a>Find me!</a>")
print tree.xpath("//a/text()")

edited Apr 25 '16 at 01:59

answered Nov 15 '15 at 05:31

Aminah Nuraini

18,120
8
90
108

What difference does using soupparser make? – Padraic Cunningham Oct 20 '16 at 23:59
It's just an alternative – Aminah Nuraini Oct 21 '16 at 04:43

jkp · Answer 8 · 2009-01-12T22:57:39.417

9

The latest version of elementtree supports XPath pretty well. Not being an XPath expert I can't say for sure if the implementation is full but it has satisfied most of my needs when working in Python. I've also use lxml and PyXML and I find etree nice because it's a standard module.

NOTE: I've since found lxml and for me it's definitely the best XML lib out there for Python. It does XPath nicely as well (though again perhaps not a full implementation).

edited Jan 12 '09 at 22:57

answered Aug 14 '08 at 09:48

jkp

78,960
28
103
104

7

ElementTree's XPath support is currently minimal at best. There are huge gaping holes in functionality, such as the lack of attribute selectors, no non-default axes, no child indexing, etc. Version 1.3 (in alpha) adds some of these features, but it's still an unashamedly partial implementation. – James Brady Jan 09 '09 at 23:26

eLRuLL · Answer 9 · 2018-04-17T11:32:54.933

9

If you want to have the power of XPATH combined with the ability to also use CSS at any point you can use parsel:

>>> from parsel import Selector
>>> sel = Selector(text=u"""<html>
        <body>
            <h1>Hello, Parsel!</h1>
            <ul>
                <li><a href="http://example.com">Link 1</a></li>
                <li><a href="http://scrapy.org">Link 2</a></li>
            </ul
        </body>
        </html>""")
>>>
>>> sel.css('h1::text').extract_first()
'Hello, Parsel!'
>>> sel.xpath('//h1/text()').extract_first()
'Hello, Parsel!'

edited Apr 17 '18 at 11:32

answered Dec 16 '17 at 22:16

eLRuLL

18,488
9
73
99

how should my Xpath look like if I want to get "Link 1" and "Link 2"? – weefwefwqg3 Apr 17 '18 at 05:10
1

for getting the text, it should be something like `//li/a/text()` – eLRuLL Apr 17 '18 at 11:34

score 3 · Answer 10 · answered Aug 23 '10 at 12:57

3

Another library is 4Suite: http://sourceforge.net/projects/foursuite/

I do not know how spec-compliant it is. But it has worked very well for my use. It looks abandoned.

answered Aug 23 '10 at 12:57

codeape

97,830
24
159
188

David Joyner · Answer 11 · 2008-08-12T20:50:44.893

PyXML works well.

You didn't say what platform you're using, however if you're on Ubuntu you can get it with sudo apt-get install python-xml. I'm sure other Linux distros have it as well.

If you're on a Mac, xpath is already installed but not immediately accessible. You can set PY_USE_XMLPLUS in your environment or do it the Python way before you import xml.xpath:

if sys.platform.startswith('darwin'):
    os.environ['PY_USE_XMLPLUS'] = '1'

In the worst case you may have to build it yourself. This package is no longer maintained but still builds fine and works with modern 2.x Pythons. Basic docs are here.

How to use XPath in Python?

11 Answers11

Example:

Linked