Is there an easy way to manipulate XML documents in Python?

Question

I have done a little research around the matter, but haven't really been able to come up with anything useful. What I need is to not just parse and read, but actually manipulate XML documents in python, similar to the way JavaScript is able to manipulate HTML documents.

Allow me to give an example. say I have the following XML document:

<library>
    <book id=123>
        <title>Intro to XML</title>
        <author>John Smith</author>
        <year>1996</year>
    </book>
    <book id=456>
        <title>XML 101</title>
        <author>Bill Jones</author>
        <year>2000</year>
    </book>
    <book id=789>
        <title>This Book is Unrelated to XML</title>
        <author>Justin Tyme</author>
        <year>2006</year>
    </book>
</library>

I need a way both to retrieve an element, either using XPath or with a "pythonic" method, as outlined here, but I also need to be able to manipulate the document, such as below:

>>>xml.getElement('id=123').title="Intro to XML v2"
>>>xml.getElement('id=123').year="1998"

If anyone is aware of such a tool in Python, please let me know. Thanks!

`minidom` might be considered slow, but it is readable and resembles JavaScript. — Blender, Nov 01 '11 at 13:34

score 15 · Answer 1 · edited May 23 '17 at 12:16

15

If you want to avoid installing lxml.etree, you can use xml.etree from the standard library.

Here is Acorn's answer ported to xml.etree:

import xml.etree.ElementTree as et  # was: import lxml.etree as et

xmltext = """
<root>
    <fruit>apple</fruit>
    <fruit>pear</fruit>
    <fruit>mango</fruit>
    <fruit>kiwi</fruit>
</root>
"""

tree = et.fromstring(xmltext)

for fruit in tree.findall('fruit'): # was: tree.xpath('//fruit')
    fruit.text = 'rotten %s' % (fruit.text,)

print et.tostring(tree) # removed argument: prettyprint

note: I would have put this as a comment on Acorn's answer if I could have done so in a clear manner. If you like this answer, give the upvote to Acorn.

edited May 23 '17 at 12:16

Community

1
1

answered Nov 01 '11 at 14:16

Steven Rumbalski

44,786
9
89
119

does `xml.etree` a method for parsing the tree with xpath? – ewok Nov 01 '11 at 14:34
ElementTree only supports a subset of XPath. – Acorn Nov 01 '11 at 14:38

Acorn · Accepted Answer · 2011-11-01T14:01:41.383

13

lxml allows you to select elements using XPath, and also manipulate those elements.

import lxml.etree as et

xmltext = """
<root>
    <fruit>apple</fruit>
    <fruit>pear</fruit>
    <fruit>mango</fruit>
    <fruit>kiwi</fruit>
</root>
"""

tree = et.fromstring(xmltext)

for fruit in tree.xpath('//fruit'):
    fruit.text = 'rotten %s' % (fruit.text,)

print et.tostring(tree, pretty_print=True)

Result:

<root>
    <fruit>rotten apple</fruit>
    <fruit>rotten pear</fruit>
    <fruit>rotten mango</fruit>
    <fruit>rotten kiwi</fruit>
</root>

edited Nov 01 '11 at 14:01

answered Nov 01 '11 at 13:30

Acorn

49,061
27
133
172

when attempting this, I get an error on `import lxml.etree as et`, saying that there is `No module named lxml.etree` – ewok Nov 01 '11 at 13:45
lxml is not part of the standard library, you need to [install it](http://lxml.de/installation.html) – omz Nov 01 '11 at 13:51
Also check out [lxml.objectify](http://lxml.de/objectify.html) -- a great, different way to deal with the tree... – Ryan Roemer Nov 01 '11 at 14:23
can you explain how to install lxml? I have attempted following the instruction on the web page with absolutely no success. I am running Windows 7 64 bit – ewok Nov 01 '11 at 14:28
Just install using the relevant executable from here: http://pypi.python.org/pypi/lxml/2.3 – Acorn Nov 01 '11 at 14:35

Is there an easy way to manipulate XML documents in Python?

2 Answers2