Capture all XML element paths using xml.etree.ElementTree

Question

Using python import lxml I am able to print a list of the path for every element recursively:

from lxml import etree
root = etree.parse(xml_file)
for e in root.iter():
    path = root.getelementpath(e)
    print(path)

Results:

TreatmentEpisodes
TreatmentEpisodes/TreatmentEpisode
TreatmentEpisodes/TreatmentEpisode/SourceRecordIdentifier
TreatmentEpisodes/TreatmentEpisode/FederalTaxIdentifier
TreatmentEpisodes/TreatmentEpisode/ClientSourceRecordIdentifier
etc.

Note: I am working with this XSD: https://www.myflfamilies.com/service-programs/samh/155-2/155-2-v14/schemas/TreatmentEpisodeDataset.xsd

I want to do the same thing using import xml.etree.ElementTree as ET ...but ElementTree does not seem to have an equivalent function to lxml getelementpath().

I've read the docs. I've googled for days. I've experimented with XPath. I've guessed using iter() and tried "getpath()", "Element.getpath()", etc. hoping to discover an undocumented feature. Fail.

Perhaps I am experiencing an extreme case of "user error" and please forgive me if this is a duplicate.

I thought I found the answer here: Get Xpath dynamically using ElementTree getpath() but the XPathEvaluator only seems to operate on a 'known' element - it doesn't have an option for "give me everything".

Here is what I tried:

import xml.etree.ElementTree as ET
tree = etree.parse(xml_file)
for entry in tree.xpath('//TreatmentEpisode'):
    print(entry)

Results:

<Element TreatmentEpisode at 0xffff8f8c8a00>

What I was hoping for:

TreatmentEpisodes/TreatmentEpisode

...however, even if I received what I hoped for, I am still not sure how to obtain the full path for every element. As I understand the XPath docs, they only operate on 'known' element names. i.e. tree.xpath() seems to require the element name to be known beforehand.

It sounds like you did research and attempted code to solve it ... now provide at least what you "experimented with XPath", your code - even if failing: [example]. So we can see how to adjust. — hc_dev, Jul 01 '21 at 18:48
Fair. I can say this: I thought I found the answer here: https://stackoverflow.com/questions/13136334/get-xpath-dynamically-using-elementtree-getpath but the XPathEvaluator only seems to operate on a 'known' element - it doesn't have an option for "give me everything". However, I will put together more examples of what I tried and edit my question. — gridiron, Jul 01 '21 at 18:52
Shouldn’t be too hard to write a function to build the path of an element. Did you try that? — DisappointedByUnaccountableMod, Jul 01 '21 at 20:51
I take that back - hadn’t realised there isn’t a parent attribute :-( However this shows a way to build a child->parent dictionary which can easily be the basis of getting the element path https://stackoverflow.com/questions/2170610/access-elementtree-node-parent-node#2170994 — DisappointedByUnaccountableMod, Jul 01 '21 at 21:05
@barny - your first comment was the trick. I never tried the function example in the later half of the SO link I quoted in my own question! Definitely severe user error going on here as I tried that function just now on your suggestion and it worked! Perhaps my eyes failed from reading too many forums but that was the answer right in front of me. Davide Brunato's post should be super flagged or something. — gridiron, Jul 01 '21 at 21:12
@barny Tried. Won't let me until I have "15 reputation". I've been lurking here for 15 years - this was my first post. :( — gridiron, Jul 01 '21 at 21:33

Valdi_Bo · Answer 1 · 2021-07-03T19:48:22.267

Start from:

import xml.etree.ElementTree as et

An interesting way to solve your problem is to use iterparse - an iterative parser contained in ElementTree.

It is able to report e.g. each start and end event, for each element parsed. For details search the Web for documentation / examples of iterparse.

The idea is to:

Start with an empty list as the path.
At the start event, append the element name to path and print the full path gathered so far.
At the end event, drop the last element from path.

You can even wrap this code in a generator function:

def pathGen(fn):
    path = []
    it = et.iterparse(fn, events=('start', 'end'))
    for evt, el in it:
        if evt == 'start':
            path.append(el.tag)
            yield '/'.join(path)
        else:
            path.pop()

Now, when you run:

for pth in pathGen('Input.xml'):
    print(pth)

you will get a printout of full paths of all elements in your source file, something like:

TreatmentEpisodes
TreatmentEpisodes/TreatmentEpisode
TreatmentEpisodes/TreatmentEpisode/SourceRecordIdentifier
TreatmentEpisodes/TreatmentEpisode/FederalTaxIdentifier
TreatmentEpisodes/TreatmentEpisode/ClientSourceRecordIdentifier
TreatmentEpisodes/TreatmentEpisode
TreatmentEpisodes/TreatmentEpisode/SourceRecordIdentifier
TreatmentEpisodes/TreatmentEpisode/FederalTaxIdentifier
TreatmentEpisodes/TreatmentEpisode/ClientSourceRecordIdentifier
...

I would upvote your answer but I still don't have the minimum reputation to do so! — gridiron, Oct 09 '21 at 14:47
For future readers of this post, Valdi_Bo's answer works and also Davide Brunato's answer in this post: https://stackoverflow.com/questions/13136334/get-xpath-dynamically-using-elementtree-getpath — gridiron, Oct 09 '21 at 14:48

Capture all XML element paths using xml.etree.ElementTree

1 Answers1

Linked