I recommend learning about Regular Expressions (more commonly referred to as Regex), I use them all the time for problems like this.
This is a good place to reference the different aspects of Regex:
Regex
Regex is a way to match text, its a lot like if "substring" in string:
except a million times more powerful. The entire purpose of regex is to find that "substring" even when you don't know what it is.
So lets take a closer look at your example in particular, first thing to do is figure out exactly which rules need to be true in order to "match" the y value.
I don't know exactly how you are actually reading in your data, but am reading it in as a single string.
string = '<graphiceditor>' \
'<plot name="DS_Autobahn 1.Track: Curvature <78.4204 km>" type="CurvePlot">' \
'<parent>DS_Autobahn 1</parent>' \
'<curve name="" isXTopAxis="0" markerSize="8" symbol="-1"' \
'<point x="19.986891478960015" y="-0.00020825890723451596"/>' \
'<point ....'
You can see I split the sting into multiple lines to make it more readable. If you are reading it from a file with open() make sure to remove the "\n" meta-characters or my regex wont work (not that you cant write regex that would!)
The first thing I want to do is find the curve tag, then I want to continue on to find the y= section, then grab just the number. Let's simplify that out into really defined steps:
- Find where the curve section begins
- Continue until the next y= section begins
- Get the value from inside the quotes after the y= section.
Now for the regex, I could explain how exactly it works but we would be here all day. Go back to that Doc I linked at the start and read-up.
import re
string = "[see above]"
y_val = re.search('<curve.*?y="(.*?)"', string).group(1)
That's it! Cast your y_val to a float() and you are ready to go!