1

I am attempting to parse a maven project definition using python to extract a version.

The project definition looks like:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
                        http://maven.apache.org/maven-v4_0_0.xsd">
   <modelVersion>4.0.0</modelVersion>

   <groupId>...</groupId>
   <artifactId>...</artifactId>
   <version>1.6.0-SNAPSHOT</version>
   ...
</project>

I can extract the version using:

root = ET.fromstring(xml)
version = root.find('./p:version', { 'p': 'http://maven.apache.org/POM/4.0.0' })
print(version.text)

prints: 1.6.0-SNAPSHOT

However, the namespace used may change, and I don't want to depend on this. Is there a way to extract the namespace to use in my subsequent xpath expression?

I tried the following, to see if xmlns was itself exposed, but no luck:

root = ET.fromstring(xml)
for k in root.attrib:
    print('%s => %s' % (k, root.attrib[k]))

prints: {http://www.w3.org/2001/XMLSchema-instance}schemaLocation => http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd
toolkit
  • 49,809
  • 17
  • 109
  • 135
  • possible duplicate of [Accessing XMLNS attribute with Python Elementree?](http://stackoverflow.com/questions/1953761/accessing-xmlns-attribute-with-python-elementree) – Gareth Latty Jan 16 '13 at 16:12

2 Answers2

2

Unfortunately, ElementTree namespace support is rather patchy.

You'll need to use an internal method from the xml.etree.ElementTree module to get a namespace map out:

_, namespaces = ET._namespaces(root, 'utf8')

namespaces is now a dict with URIs as keys, and prefixes as values.

You could switch to lxml instead. That library implements the same ElementTree API, but has augmented that API considerably.

For example, each node includes a .nsmap attribute which maps prefixes to URIs, including the default namespace under the key None.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
2

However, the namespace used may change, and I don't want to depend on this.

Are you saying that the namespace uri might change, or that the prefix might? If it's just the prefix, then that's not an issue, because what matters is that the prefixes in your XPath match the prefixes you supply to the XPath evaluator. And in either case, auto-detecting the namespaces is probably a bad call. Suppose someone decides to start generating that XML like this:

<proj:project xmlns:proj="http://maven.apache.org/POM/4.0.0" 
xmlns:other="http://maven.apache.org/POM/5.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
                    http://maven.apache.org/maven-v4_0_0.xsd">

which is still perfectly representing the XML in the same namespace as your example, but you have no idea that the proj prefix is the namespace prefix you're looking for.

I think it's unlikely that Apache would suddenly change the namespace for one of their official XML formats, but if you are genuinely worried about it, there should always be the option of using local-name() to namespace-agnostically find a node you're looking for:

version = root.find('./*[local-name() = "version"]')

Also, I'm not familiar with the elementTree library, but you could try this to try to get information about the XML document's namespaces, just to see if you can:

namespaces = root.findall('//namespace::*')
JLRishe
  • 99,490
  • 19
  • 131
  • 169
  • Thankyou. I couldn't get the local-name() predicate working (looks like ElementTree's xpath support is limited). So I think I'll just rely on Apache not releasing another version anytime soon :-) – toolkit Jan 17 '13 at 10:24