0

I have a python script running to parse an XML document using the ElementTree library. I am able to parse all of the data, including attribute data, however there are a few attributes that have "xts:" as a prefix.

So:

var1 = child.attrib['abc']
var2 = child.attrib['xts:xyz']

When I run the script, it is able to collect the "abc" attribute data but the "xts:xyz" attribute data is null, despite the fact that there is content associated with that attribute.

It doesn't sound like ":" is a special character in Python that I need to escape. Any ideas?

user242153
  • 151
  • 8

1 Answers1

1

The issue here is that xts is a namespace. It's not necessary to escape it, but with ElementTree, it is necessary to tell it about the namespace in order to get it to work properly.

For example, this code (using XPath syntax in findall):

import xml.etree.ElementTree as ET

xmlStr = """<?xml version="1.0" encoding="UTF-8"?>
<stuff xmlns:xts="http://www.stackoverflow.com">
    <abc foo="bar">Baz</abc>
    <xts:xyz narf="poit">troz</xts:xyz>
</stuff>
"""    

namespaces = {"xts": "http://www.stackoverflow.com"}

root = ET.fromstring(xmlStr)

abcNode = root.findall("./abc", namespaces=namespaces)
xyzNode = root.findall("./xts:xyz", namespaces=namespaces)

Yields these results:

>>> print abcNode[0].attrib
{'foo': 'bar'}
>>> print xyzNode[0].attrib
{'narf': 'poit'}

For more discussion/details about parsing namespaces using ElementTree, you can refer to Parsing XML with namespace in Python via 'ElementTree'.

Edit in response to comment from OP:

Given this code (added to the above code for the import, etc), which reflects the colon in the attribute of the xyz node:

xmlStr2 = """<?xml version="1.0" encoding="UTF-8"?>
<stuff>
    <abc foo="bar">Baz</abc>
    <xyz narf="xts:poit">troz</xyz>
</stuff>
"""

root2 = ET.fromstring(xmlStr2)

abcNode2 = root2.findall("./abc")
xyzNode2 = root2.findall("./xyz")

print "abc2 attrib: {0}".format(abcNode2[0].attrib)
print "xyz2 attrib: {0}".format(xyzNode2[0].attrib)

This net-new outputs:

abc2 attrib: {'foo': 'bar'}
xyz2 attrib: {'narf': 'xts:poit'}

So ElementTree doesn't have an issue with parsing an attribute containing a colon.

You mentioned in your comment that:

I still get a key error, regardless if I use xyzNode.attrib['poit'] or xyzNode.attrib['xts:poit']

I think the crux of that issue (at least in regards to find) is that what it returns is a list of Element objects (even if it's just a single Element), as seen here:

>>> print xyzNode2
[<Element 'xyz' at 0x7f59bed39150>]

So in order to use attrib, you need to access an element within that list. You could use a for-in loop to loop over all of them and process them (or in this case the single one) accordingly, or if you know there's only one, you can just access it directly using a [0] subscript, as I did above.

Community
  • 1
  • 1
khampson
  • 14,700
  • 4
  • 41
  • 43
  • Thanks - very helpful. Still have an issue however, the xml doc is laid out in a different way. The node itself doesn't have the xts prefix, just the attribute. So in your example it would be troz If I find the node using "find" and add the namespace=namespace and then try to pull the attribute I still get a key error, regardless if I use xyzNode.attrib['poit'] or xyzNode.attrib['xts:poit'] – user242153 Aug 14 '14 at 16:46