The issue here is that xts
is a namespace. It's not necessary to escape it, but with ElementTree, it is necessary to tell it about the namespace in order to get it to work properly.
For example, this code (using XPath syntax in findall
):
import xml.etree.ElementTree as ET
xmlStr = """<?xml version="1.0" encoding="UTF-8"?>
<stuff xmlns:xts="http://www.stackoverflow.com">
<abc foo="bar">Baz</abc>
<xts:xyz narf="poit">troz</xts:xyz>
</stuff>
"""
namespaces = {"xts": "http://www.stackoverflow.com"}
root = ET.fromstring(xmlStr)
abcNode = root.findall("./abc", namespaces=namespaces)
xyzNode = root.findall("./xts:xyz", namespaces=namespaces)
Yields these results:
>>> print abcNode[0].attrib
{'foo': 'bar'}
>>> print xyzNode[0].attrib
{'narf': 'poit'}
For more discussion/details about parsing namespaces using ElementTree, you can refer to Parsing XML with namespace in Python via 'ElementTree'.
Edit in response to comment from OP:
Given this code (added to the above code for the import
, etc), which reflects the colon in the attribute of the xyz
node:
xmlStr2 = """<?xml version="1.0" encoding="UTF-8"?>
<stuff>
<abc foo="bar">Baz</abc>
<xyz narf="xts:poit">troz</xyz>
</stuff>
"""
root2 = ET.fromstring(xmlStr2)
abcNode2 = root2.findall("./abc")
xyzNode2 = root2.findall("./xyz")
print "abc2 attrib: {0}".format(abcNode2[0].attrib)
print "xyz2 attrib: {0}".format(xyzNode2[0].attrib)
This net-new outputs:
abc2 attrib: {'foo': 'bar'}
xyz2 attrib: {'narf': 'xts:poit'}
So ElementTree doesn't have an issue with parsing an attribute containing a colon.
You mentioned in your comment that:
I still get a key error, regardless if I use xyzNode.attrib['poit'] or
xyzNode.attrib['xts:poit']
I think the crux of that issue (at least in regards to find
) is that what it returns is a list of Element
objects (even if it's just a single Element
), as seen here:
>>> print xyzNode2
[<Element 'xyz' at 0x7f59bed39150>]
So in order to use attrib
, you need to access an element within that list. You could use a for-in
loop to loop over all of them and process them (or in this case the single one) accordingly, or if you know there's only one, you can just access it directly using a [0]
subscript, as I did above.