I'm trying to repair some old code (from another author) that seems to be not finding results like it used to:
The crux of the problem is the code snippet:
import lxml.etree
some_xml = open('some.xml','rt').read()
xml_tree = lxml.etree.fromstring(some_xml)
tracks_filter = lxml.etree.XPath(u'//track[@type = $track_type]')
tracks = tracks_filter(xml_tree, track_type = "General")
print("Found %u tracks" % (len(tracks)))
So my understanding of that, is given a bunch of XML with some nodes in it, find all the nodes where there's an attribute of "type", which equals "General". Thus in the XML below - it should find the first node, but it doesn't.
The XML is (from mkvinfo, but reduced for brevity):
<?xml version="1.0" encoding="UTF-8"?>
<MediaInfo
xmlns="https://mediaarea.net/mediainfo"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://mediaarea.net/mediainfo https://mediaarea.net/mediainfo/mediainfo_2_0.xsd"
version="2.0">
<creatingLibrary version="17.12" url="https://mediaarea.net/MediaInfo">MediaInfoLib</creatingLibrary>
<media ref="/var/lib/mythtv/recordings/1034_20180715093100.ts">
<track type="General">
<ID>848</ID>
<VideoCount>1</VideoCount>
<AudioCount>1</AudioCount>
<TextCount>1</TextCount>
</track>
<track type="Video">
<StreamOrder>0-0</StreamOrder>
<ID>164</ID>
<MenuID>1</MenuID>
<Format>MPEG Video</Format>
<Format_Version>2</Format_Version>
</track>
<track type="Text">
<ID>45-801</ID>
<MenuID>1</MenuID>
<Format>Teletext Subtitle</Format>
<Language>en</Language>
</track>
</media>
</MediaInfo>
NOTE: I have hand-edited this XML code to reduce its size, so any XML errors are probably from my handiwork (or lack there-of).
I expect the code to parse and find the ''track'' item, where it has the attribute ''type="General"'', but it always finds nothing. The code I am trying to repair (update) contains many such parsing operations.
thanks for your help.
EDIT - So the actual answer:
XPath needs to be told the namespace of the XML. Here the namespace is defined as: "https://mediaarea.net/mediainfo" (3rd line of the XML).
So the filter lxml.etree.XPath needs to have its path modified, and the namespace dictionary passed to it:
tracks_filter = lxml.etree.XPath(u'//i:track[@type = $track_type]', namespaces={'i':'https://mediaarea.net/mediainfo'})
tracks = tracks_filter(xml_tree, track_type = "General")
Then it works.