find specific child in xml

Question

<graphiceditor>
    <plot name="DS_Autobahn 1.Track: Curvature &lt;78.4204 km>" type="CurvePlot">
        <parent>DS_Autobahn 1</parent>
        ...
        <curve name="" isXTopAxis="0" markerSize="8" symbol="-1" 
            <point x="19.986891478960015" y="-0.00020825890723451596"/>
            <point ....

Hello, I want to open the .xml file, find "curve" and import the y-coordinate of the curve into a list. I know that "curve" has the index [16] so I am using this right now:

tree = ET.parse(file_name)
root = tree.getroot()    
curvature = [float(i) for i in[x["y"] for x in [root[0][16][i].attrib for i in range(len(root[0][16]))]]]

But how do I do it, if curve is not at the 16th position? How do I find curve in any xml file then? I have been trying for several hours now but I simply do not get it. Thank you very much in advance.

score 0 · Answer 1 · answered Jun 18 '18 at 16:29

0

You could use XPath for instance.

This would then essentially look like:

root.findall(xpath)

where your xpath would be './/curve' if you are just interested in all childs of tag-type curve.

For more inofrmation regarding xpath see w3schools

answered Jun 18 '18 at 16:29

mjoppich

3,207
1
11
13

score 0 · Answer 2 · answered Jun 18 '18 at 16:37

I recommend learning about Regular Expressions (more commonly referred to as Regex), I use them all the time for problems like this.

This is a good place to reference the different aspects of Regex: Regex

Regex is a way to match text, its a lot like if "substring" in string: except a million times more powerful. The entire purpose of regex is to find that "substring" even when you don't know what it is.

So lets take a closer look at your example in particular, first thing to do is figure out exactly which rules need to be true in order to "match" the y value. I don't know exactly how you are actually reading in your data, but am reading it in as a single string.

string = '<graphiceditor>' \
    '<plot name="DS_Autobahn 1.Track: Curvature &lt;78.4204 km>" type="CurvePlot">' \
    '<parent>DS_Autobahn 1</parent>' \
    '<curve name="" isXTopAxis="0" markerSize="8" symbol="-1"' \
    '<point x="19.986891478960015" y="-0.00020825890723451596"/>' \
    '<point ....'

You can see I split the sting into multiple lines to make it more readable. If you are reading it from a file with open() make sure to remove the "\n" meta-characters or my regex wont work (not that you cant write regex that would!)

The first thing I want to do is find the curve tag, then I want to continue on to find the y= section, then grab just the number. Let's simplify that out into really defined steps:

Find where the curve section begins
Continue until the next y= section begins
Get the value from inside the quotes after the y= section.

Now for the regex, I could explain how exactly it works but we would be here all day. Go back to that Doc I linked at the start and read-up.

import re
string = "[see above]"
y_val = re.search('<curve.*?y="(.*?)"', string).group(1)

That's it! Cast your y_val to a float() and you are ready to go!

Daniel Haley · Answer 3 · 2018-06-18T18:31:48.317

Use an XML parser to parse XML; not regex.

Like mentioned in another answer, I would also use XPath. If you need to use complex XPaths, I'd recommend using lxml. In your example though ElementTree will suffice.

For example, this Python...

import xml.etree.ElementTree as ET

tree = ET.parse("file_name.xml")
root = tree.getroot()
curvature = [float(y) for y in [point.attrib["y"] for point in root.findall(".//curve/point[@y]")]]

print(curvature)

using this XML ("file_name.xml")...

<graphiceditor>
    <plot name="DS_Autobahn 1.Track: Curvature &lt;78.4204 km>" type="CurvePlot">
        <parent>DS_Autobahn 1</parent>
        <curve name="" isXTopAxis="0" markerSize="8" symbol="-1">
            <point x="19.986891478960015" y="-0.00020825890723451596"/>
            <point x="19.986891478960015" y="-0.00030825690983451678"/>
        </curve>
    </plot>
</graphiceditor>

will print...

[-0.00020825890723451596, -0.0003082569098345168]

Note: Notice the difference between the second y coordinate in the list and what's in the XML. That's because you're casting the value to a float. Consider casting to a decimal if you need to maintain precision.

find specific child in xml

3 Answers3