python lxml - how to get the value of a subelement in XML

Question

The XML:

<tree>
  <row>
     <a>This is a</a>
     <b>This is b</b>
  </row>
</tree>

So I have seen many solutions across the web and looked up many of them already. The following didn't work for me:

tree = etree.XML('file.xml')
print tree[0].findtext('a'). // None
print tree[0].find('a'). // None
print tree[0].xpath('a') // None
print tree[0].xpath('/a') //None
print tree[0].xpath('//a') //None
print tree[0].xpath('/a') //None
print tree.xpath('//row/a') //None
print tree.xpath('//row/a/text()') //None

The only way i found is like doing tree[0][0].text But my actual XML contains 25 subelements and it isn't really clean code to do this 25 times..

Maybe you guys know what i am doing wrong?

I also know there is something like BeautifulSoup but after testing, i came to the conclusion this does not fit my case due to the performance.. (benchmark here)

Thanks!

score 1 · Answer 1 · answered Feb 26 '18 at 12:23

1

You can use .iter and a for loop.

for row_node in tree.iter('row'):
    a_node = row_node.find('a')
    b_node = row_node.find('b')
    print(a_node.text)
    print(b_node.text)

# This is a
# This is b

answered Feb 26 '18 at 12:23

DeepSpace

78,697
11
109
154

score 0 · Accepted Answer · answered Feb 26 '18 at 12:48

So i finally figured out my problem. It was the name spacing of the xml. I didn't do anything with it, so i though it was not necessary to look at.

The XML was slightly different:

<tree xmlns="http://www.schemas.net/schema/MyXMLSchema">
  <row>
     <a>This is a</a>
     <b>This is b</b>
  </row>
</tree>

So what i needed to do in the find was add the namespace. To do this dynamically i used the answer from an other question Like this:

tree = etree.XML('file.xml')
namespace = tree.xpath('namespace-uri(.)')
for row in tree:
    print row.findtext('{%s}a' % namespace)
    print row.findtext('{%s}b' % namespace)

# This is a
# This is b

if concerned about not only containing rows, tree.iter('row') is indeed, like DeepSpace pointed out, a better outcome.

python lxml - how to get the value of a subelement in XML

2 Answers2