0

Stuck on something and don't see it in the documentation. I'm trying to query very large XML files based on the text value of , etc. and one of the values I would love to get when querying is the complete absolute path, including indexing information.

import xml.etree.ElementTree as ET
xmlfile = *pathtoFile*
tree = ET.parse(xmlfile)
root = tree.getroot()

for elm in root.findall("./sequence/media/video/track/clipitem/file/name[.='Graphic']../../name"): 
    CurrentClip = (elm.text)
    Graphics_Name_List.append(CurrentClip)

for elm in root.findall("./sequence/media/video/track/clipitem/file/name[.='Graphic']../../start"): 
    CurrentClip = (elm.text)
    Graphics_Start_List.append(CurrentClip)

The above code will append of all graphics to a list named "Graphics_Name_List", and the of all graphics to a list named "Graphics_Start_List", by using (elm.text)

What I'm really hoping to find is a way to include the entire absolute path to these elements that I've queried. I found 2 interesting answers elsewhere on StackOverflow, but they do not include indexing.

Capture all XML element paths using xml.etree.ElementTree

Get Xpath dynamically using ElementTree getpath()

Both of those functions return Xpath like this:

./sequence/media/video/track/clipitem/filter/effect/parameter/name

But what I really require is an Xpath like this, with indexing:

./sequence[0]/media[0]/video[0]/track[3]/clipitem[56]/filter[0]/effect[0]/parameter[0]/name[0]

What I'm really trying to do is pull which Track/Clipitem it appears in (Track 3? Track 5?), but I'm currently finding that to be tough.

So far I believe I'm only using features from xml.etree.ElementTree... I know there's a feature in lxml that can do this, but I also don't know how to mix modules and became a little confused when passing an element value that had been parsed in ET.parse(xmlfile) into etree.Xpath(elm)... is that even possible?

  • 1
    If there is a feature in lxml, why not just use it? Avoid "mixing modules". – mzjn Jul 13 '22 at 05:46
  • Could you please explain to me how to do that? Do I have to build an entirely parallel set of varilables? As you can see from my code I've already built import xml.etree.ElementTree as ET xmlfile = *pathtoFile* tree = ET.parse(xmlfile) root = tree.getroot() So when I build for loops to query, I'm using my "root" variable from ET.parse. Do I just build the same thing along side it but using all lxml and then build for loops referencing like... for elm in root2.findall() – Car_SharkHybrid Jul 13 '22 at 06:09
  • 1
    https://stackoverflow.com/a/1577495/407651 – mzjn Jul 13 '22 at 06:10
  • Sorry I meant how to implement in my script? Please see above comment. Thanks. – Car_SharkHybrid Jul 13 '22 at 06:12
  • 1
    You asked for a way to get the absolute xpath of elements and I provided a link. That solution uses lxml only. That means you should too. Do not try to mix `lxml.etree` objects and `xml.etree.ElementTree` objects. – mzjn Jul 13 '22 at 06:19
  • "an Xpath like this, with indexing: `./sequence[0]/media[0]/video[0]/track[3]/clipitem[56]/filter[0]/effect[0]/parameter[0]/name[0]`": interesting, last time I looked, in XPath the first index was 1 and not 0. So that example doesn't make a lot of sense with all those `[0]` predicates. – Martin Honnen Jul 16 '22 at 11:42
  • @martinhonnen You are correct, sorry for the mistake. Every line I have to change gears cause Python first index is [0] and XPath first index is [1], so returning results from xpath into an array and then indexing the array, I have to keep track of if my for loop starts at 0 or 1. Regardless, my question is how to get absolute path when querying in ElementTree. – Car_SharkHybrid Jul 17 '22 at 19:28

0 Answers0