1

I'm trying to repair some old code (from another author) that seems to be not finding results like it used to:

The crux of the problem is the code snippet:

import lxml.etree

some_xml = open('some.xml','rt').read()
xml_tree = lxml.etree.fromstring(some_xml)

tracks_filter = lxml.etree.XPath(u'//track[@type = $track_type]')
tracks = tracks_filter(xml_tree, track_type = "General")

print("Found %u tracks" % (len(tracks)))

So my understanding of that, is given a bunch of XML with some nodes in it, find all the nodes where there's an attribute of "type", which equals "General". Thus in the XML below - it should find the first node, but it doesn't.

The XML is (from mkvinfo, but reduced for brevity):

<?xml version="1.0" encoding="UTF-8"?>
<MediaInfo
    xmlns="https://mediaarea.net/mediainfo"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://mediaarea.net/mediainfo https://mediaarea.net/mediainfo/mediainfo_2_0.xsd"
    version="2.0">
<creatingLibrary version="17.12" url="https://mediaarea.net/MediaInfo">MediaInfoLib</creatingLibrary>
<media ref="/var/lib/mythtv/recordings/1034_20180715093100.ts">

<track type="General">
<ID>848</ID>
<VideoCount>1</VideoCount>
<AudioCount>1</AudioCount>
<TextCount>1</TextCount>
</track>

<track type="Video">
<StreamOrder>0-0</StreamOrder>
<ID>164</ID>
<MenuID>1</MenuID>
<Format>MPEG Video</Format>
<Format_Version>2</Format_Version>
</track>

<track type="Text">
<ID>45-801</ID>
<MenuID>1</MenuID>
<Format>Teletext Subtitle</Format>
<Language>en</Language>
</track>

</media>
</MediaInfo>

NOTE: I have hand-edited this XML code to reduce its size, so any XML errors are probably from my handiwork (or lack there-of).

I expect the code to parse and find the ''track'' item, where it has the attribute ''type="General"'', but it always finds nothing. The code I am trying to repair (update) contains many such parsing operations.

thanks for your help.

EDIT - So the actual answer:

XPath needs to be told the namespace of the XML. Here the namespace is defined as: "https://mediaarea.net/mediainfo" (3rd line of the XML).

So the filter lxml.etree.XPath needs to have its path modified, and the namespace dictionary passed to it:

tracks_filter = lxml.etree.XPath(u'//i:track[@type = $track_type]', namespaces={'i':'https://mediaarea.net/mediainfo'})
tracks = tracks_filter(xml_tree, track_type = "General")

Then it works.

Kingsley
  • 14,398
  • 5
  • 31
  • 53
  • You're not taking into account the default namespace in your XPath. See duplicate link for further details. – kjhughes Aug 09 '18 at 03:09

0 Answers0