2

Suppose I have the following test.xml:

<?xml version="1.0" encoding="UTF-8"?>
<test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Parent>
  <FirstNode name="FirstNodeName"></FirstNode>
    <Child1>Test from Child1</Child1>
  <SecondNode name="SecondNodeName" type="SecondNodeType">
    <Child2>
      <GrandChild>Test from GrandChild</GrandChild>
    </Child2>
  </SecondNode>
</Parent>
</test:myXML>

I'd like to iterate over the whole tree, and get the path of each node, including the attributes. I am able to iterate over the tree and retrieve the path to each node as follows:

from lxml import etree

xmlDoc = etree.parse("test.xml")
root = xmlDoc.getroot()

for node in xmlDoc.iter():
    print("path: ", xmlDoc.getpath(node))

As expected, this prints out:

path:  /test:myXML
path:  /test:myXML/Parent
path:  /test:myXML/Parent/FirstNode
path:  /test:myXML/Parent/Child1
path:  /test:myXML/Parent/SecondNode
path:  /test:myXML/Parent/SecondNode/Child2
path:  /test:myXML/Parent/SecondNode/Child2/GrandChild

However, as I mentioned, I'd like to somehow print the attributes of said node, and its parents, along with its path. For example, if I want to print the element "Child2", then I'd like for the attributes of each of its parent elements to be displayed as well. Something like:

path:  /test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2

Is this possible? I'm not too fussed about the namespaces of the root element if that makes it easier.

Adam
  • 2,384
  • 7
  • 29
  • 66

1 Answers1

1

I don't know of any prepackaged method to do that, but with all the enforced "working from home" going on, I figured I might as well try to come up with something. It's inelegant, but seems to do the job...

Try this on your actual code and see if it works:

att = """
<test:myXML xmlns:test="http://com/my/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Parent>
  <FirstNode name="FirstNodeName"></FirstNode>
    <Child1>Test from Child1</Child1>
  <SecondNode name="SecondNodeName" type="SecondNodeType">
    <Child2>
      <GrandChild>Test from GrandChild</GrandChild>
    </Child2>
  </SecondNode>
</Parent>
</test:myXML>
"""

from lxml import etree

bef = []
xps = []

xmlDoc = etree.fromstring(att)
root = etree.ElementTree(xmlDoc)

for node in xmlDoc.iter():        
    ats = "{"
    for a in range(len(node.keys())):
        mystr = node.keys()[a]+'="'+node.values()[a]+'" '
        ats +=mystr
    ats+='}'
    xp = root.getpath(node)    
    bef.append(xp)
    ent = ''
    if len(ats)>2:
        ent+=xp
        ent+=ats.replace(' }','}')        
    else:
        ent+=xp
    xps.append(ent)

for b,  f in zip(bef,xps):
    prev = bef.index(b)-1
    if prev >=0:
        cur = b.rsplit("/",1)[0]
        new_cur = f.rsplit("/",1)[1]
        if bef[prev]==cur:
            new_f = xps[prev]+'/'+new_cur
            xps[prev+1]=new_f
            print(new_f)
        else:
            print(f)  

Output:

/test:myXML/Parent
/test:myXML/Parent/FirstNode{name="FirstNodeName"}
/test:myXML/Parent/Child1
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2
/test:myXML/Parent/SecondNode{name="SecondNodeName" type="SecondNodeType"}/Child2/GrandChild

If it works and you're interested, I can try to explain what all this does...

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • Thank you! This makes sense. I've accepted this answer as it does what I originally asked but I decided to go down a different path with my problem and have raised another issue here: https://stackoverflow.com/q/60949859/3480297 Could you have a look please? – Adam Mar 31 '20 at 14:40