I've got the following "test.xml" file:
<?xml version="1.0" encoding="UTF-8"?>
<test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Parent>
<Child1 xsi:type="sample-type">
<GrandChild1>123</GrandChild1>
<GrandChild2>BranchName</GrandChild2>
</Child1>
<Child2 xsi:type="sample-type2"></Child2>
</Parent>
</test:myXML>
I would like to retrieve the 'xsi:type' for any node (where it exists). For example, in the above xml, I'd like to iterate over each node and return "sample-type" and "sample-type2"
So far, I've got the below code:
from lxml import etree
XMLDoc = etree.parse("test.xml")
rootXMLElement = XMLDoc.getroot()
tree = etree.parse("test.xml")
for Node in XMLDoc.xpath('//*'):
if "xsi:type" in Node.attrib:
#Do whatever
However, this doesn't work because it seems like the the "xsi:type" in the result is literally being replaced by the xmlns:xsi in the namespace declaration. As an illustration, if I print each Node attribute using the below code:
from lxml import etree
XMLDoc = etree.parse("test.xml")
rootXMLElement = XMLDoc.getroot()
tree = etree.parse("test.xml")
for Node in XMLDoc.xpath('//*'):
print(Node.attrib)
The result is:
{}
{}
{'{http://www.w3.org/2001/XMLSchema-instance}type': 'sample-type'}
{}
{}
{'{http://www.w3.org/2001/XMLSchema-instance}type': 'sample-type2'}
As you can see, where the "xsi-type" attribute exists, it literally replaces it with the xsi in the namespace. How can I stop that from happening? I'd like to search for xsi-type rather than inputting the string literal from the namespace declaration.