1

Given the following XML (fragment):

<node id="b071f9fa-14b0-4217-8e97-eb41da73f598" type="Group" ext:score="90">
<node id="b071f9fa-14b0-4217-8e97-eb41da73f599" type="Person" ext:score="100">
<node id="b071f9fa-14b0-4217-8e97-eb41da73f600" type="Business" ext:score="80">

I want to retrieve the id of nodes that have an ext:score of 100.

The current code:

match = dom.xpath('//node[@ext:score="100"]/@id')[0]

Returns an exception:

lxml.etree.XPathEvalError: Undefined namespace prefix

I have read (both here and in XPath docs) that ext would first need to be defined as a valid namespace, as the DOM cannot be parsed as an attribute if it contains special characters. However, I have been unable to find a good example of how to do this. There is no definition of ext in the excerpts I am processing and I'm not sure how to create a namespace prefix.

Any thoughts?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
Jasper33
  • 519
  • 3
  • 6
  • 18
  • 1
    Possible duplicate of [How does XPath deal with XML namespaces?](https://stackoverflow.com/questions/40796231/how-does-xpath-deal-with-xml-namespaces) – kjhughes Nov 17 '17 at 15:01
  • I've read that @kjhughes, and I understand how to *create* a namespace, but I don't see how I can then use that namespace prefix to test for a condition. Still looking... Thanks! – Jasper33 Nov 17 '17 at 15:07
  • Does your XML have a namespace declaration for `ext` -- something like `xmlns:ext="http://example.com/extention"` on an element above the `node` elements? – kjhughes Nov 17 '17 at 15:14
  • @kjhughes - I don't (these come to me *as-is*,) but I've been told that the original contains this: `` which is what I used to try to *synthesize* the prefix. – Jasper33 Nov 17 '17 at 15:34

1 Answers1

2

The colon character in an XML attribute (or element) name such as ext:score separates the namespace prefix, ext, from the local name, score. Namespace prefixes themselves are significant only by virtue of their association with a namespace value.

For this XML,

<metadata xmlns:ext="http://musicbrainz.org/ns/mmd-2.0#">
  <node id="b071f9fa-14b0-4217-8e97-eb41da73f598" type="Group" ext:score="90">
  <node id="b071f9fa-14b0-4217-8e97-eb41da73f599" type="Person" ext:score="100">
  <node id="b071f9fa-14b0-4217-8e97-eb41da73f600" type="Business" ext:score="80">
</metadata>

This XPath,

//node[@ext:score="100"]/@id

will select the id attributes of all node elements with an ext:score attribute value of 100, provided you have a way to bind a namespace prefix (ext) to a namespace value (http://musicbrainz.org/ns/mmd-2.0# in the language or tool from which XPath is being called.

To bind a namespace prefix to a namespace value in Python (see How does XPath deal with XML namespaces? for Python and other language examples):

from lxml import etree
f = StringIO('your XML here')
doc = etree.parse(f)
r = doc.xpath('//node[@ext:score="100"]/@id', 
              namespaces={'ext':'http://musicbrainz.org/ns/ext#-2.0'})

Note that if your XML uses ext without declaring it, it is not namespace-well-formed.

kjhughes
  • 106,133
  • 27
  • 181
  • 240