21

I have some XML in a unicode-string variable in Python as follows:

<?xml version='1.0' encoding='UTF-8'?>
<results preview='0'>
<meta>
<fieldOrder>
<field>count</field>
</fieldOrder>
</meta>
    <result offset='0'>
        <field k='count'>
            <value><text>6</text></value>
        </field>
    </result>
</results>

How do I extract the 6 in <value><text>6</text></value> using Python?

kalaracey
  • 815
  • 1
  • 12
  • 28

2 Answers2

22

With lxml:

import lxml.etree
# xmlstr is your xml in a string
root = lxml.etree.fromstring(xmlstr)
textelem = root.find('result/field/value/text')
print textelem.text

Edit: But I imagine there could be more than one result...

import lxml.etree
# xmlstr is your xml in a string
root = lxml.etree.fromstring(xmlstr)
results = root.findall('result')
textnumbers = [r.find('field/value/text').text for r in results]
Colin Dunklau
  • 3,001
  • 1
  • 20
  • 19
8

BeautifulSoup is the most simple way to parse XML as far as I know...

And assume that you have read the introduction, then just simply use:

soup = BeautifulSoup('your_XML_string')
print soup.find('text').string
Thiem Nguyen
  • 6,345
  • 7
  • 30
  • 50
  • 1
    This finds only the first `` element, regardless of position. – Colin Dunklau Jul 05 '12 at 19:36
  • 2
    I had forgotten about BeautifulSoup! Didn't know that it could parse XML. Actually, I looked on their docs and you parse xml by adding an extra parameter of 'xml', i.e., `soup = BeautifulSoup('your_XML_string', 'xml')` – kalaracey Jul 05 '12 at 19:58