3

I'm new to this, so please be patient.

Using ETree and Python 2.7, I'm trying to parse a large XML file that I did not generate. Basically, the file contains groups of voxels contained in a large volume. The general format is:

<things>
    <parameters>
        <various parameters> 
    </parameters>
    <thing id="1" comment="thing1">
        <nodes>
            <node id="1" x="1" y="1" z="1"/>
            <node id="2" x="2" y="2" z="2"/>
        </nodes>
        <edges>
            <edge source="1" target="2"/>
        </edges>
    </thing>
    <thing id="N" comment="thingN">
        <nodes>
            <node id="3" x="3" y="3" z="3"/>
            <node id="4" x="4" y="4" z="4"/>
        </nodes>
        <edges>
            <edge source="3" target="4"/>
        </edges>
    </thing>
    <comments>
        <comment node="1" content="interesting feature"/>
        <comment node="4" content="interesting feature"/>
    </comments>
</things>

A "node" contains the coordinates of a voxel, and a "thing" is a group of voxels. The "comments" are used to highlight nodes of interest.

I can find attributes of individual "node ids" using the find command, for example:

for elem in things.iterfind('thing/nodes/node[@id="221"]'):
    x = int(elem.get('x'))

I'd like to be able to determine the "thing id" to which any "node id" belongs (e.g. node 3 is in thing N). I know that I can do this using a for loop, iterating through the things and then the nodes, but I assume that there should be some way to do it more simply by finding the parent from the child.

I've tried every variant of:

elem.find(..)

that I can think of, but I get either

"None Type" or SyntaxError("cannot use absolute path on element")

I've tried the lxml getparent() command, too, as suggested in response to a similar query here: Get parent element after using find method (xml.etree.ElementTree) but to no avail.

Do I have to define the classes in this file to have complete access to the XPath tools?

Community
  • 1
  • 1
Joshua Singer
  • 33
  • 1
  • 4

3 Answers3

5

You need to traverse one level up

for elem in things.iterfind('thing/nodes/node[@id="1"]'):
    # get parent of node - nodes
    print elem.getparent() 
    # get grand parent of node - thing
    print elem.getparent().getparent()
    # now lets get the thing id
    print elem.getparent().getparent().attrib.get('id')
Shaikhul
  • 642
  • 5
  • 8
0

You can also use

for elem in things.iterfind('thing/nodes/node[@id="1"]'):
   # get parent of nodes, i.e. ancestor of node
   parent = elem.xpath('ancestor::thing')[0]
   # get the thing id
   print parent.get('id')

This way you don't have to type getparent() twice and it is clearer, who is an ancestor.

-1

for all_tags in xmlTree.findall('.//'): parent=xmlTree.find('.//%s/..' % all_tags.tag)