9

How can I tell ElementTree to ignore namespaces in an XML file?

For example, I would prefer to query modelVersion (as in statement 1) rather than {http://maven.apache.org/POM/4.0.0}modelVersion (as in statement 2).

pom="""
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
         http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
</project>
"""

from xml.etree import ElementTree
ElementTree.register_namespace("","http://maven.apache.org/POM/4.0.0")
root = ElementTree.fromstring(pom)

print 1,root.findall('modelVersion')
print 2,root.findall('{http://maven.apache.org/POM/4.0.0}modelVersion')

1 []
2 [<Element '{http://maven.apache.org/POM/4.0.0}modelVersion' at 0x1006bff10>]
Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
  • 1
    AFAIK there isn't an easy+clean way to do so, especially not if you're potentially dealing with multiple namespaces. There appears to be a duplicate question [here](http://stackoverflow.com/q/13412496/20670), but I won't wield my dupehammer if you say that those approaches won't work for you (they kind of look like dirty hacks to me). – Tim Pietzcker Dec 04 '15 at 07:13
  • Also, [`lxml` might be worth looking into](http://stackoverflow.com/q/14853243/20670), but it's not part of the standard library. – Tim Pietzcker Dec 04 '15 at 07:16
  • 1
    sadly I'm sending this to someone who can't install lxml. I hope the standard library incorporates it some day. I posted my current solution which makes me very sad coz one time I told my mom I was a professional programmer. :-/ – Mark Harrison Dec 04 '15 at 08:00
  • 1
    see also: [Python ElementTree module: How to ignore the namespace of XML files](https://stackoverflow.com/questions/13412496/python-elementtree-module-how-to-ignore-the-namespace-of-xml-files-to-locate-ma/76601149#76601149) – milahu Jul 03 '23 at 06:49

4 Answers4

2

There appears to be no straight-forward pathway, thus I'd simply wrap the find calls, e.g.

from xml.etree import ElementTree as ET

POM = """
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xmlns="http://maven.apache.org/POM/4.0.0">
    <modelVersion>4.0.0</modelVersion>
</project>
"""

NSPS = {'foo' : "http://maven.apache.org/POM/4.0.0"}

# sic!
def findall(node, tag):
    return node.findall('foo:' + tag, NSPS) 

root = ET.fromstring(POM)
print(map(ET.tostring, findall(root, 'modelVersion')))

output:

['<ns0:modelVersion xmlns:ns0="http://maven.apache.org/POM/4.0.0">4.0.0</ns0:modelVersion>\n']
decltype_auto
  • 1,706
  • 10
  • 19
1

Here's what I'm presently doing, which makes me incredibly confident that there's a better way.

$ cat pom.xml |
   tr '\n' ' ' |
   sed 's/<project [^>]*>/<project>/' |
   myprogram |
   sed 's/<project>/<project xmlns="http:\/\/maven.apache.org\/POM\/4.0.0" xmlns:xsi="http:\/\/www.w3.org\/2001\/XMLSchema-instance" xsi:schemaLocation="http:\/\/maven.apache.org\/POM\/4.0.0 http:\/\/maven.apache.org\/maven-v4_0_0.xsd">/'
Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
  • instead of sed'ing it in a pipe, you could patch the xml string in the python script or create a dummy namespace and a wrapper function (pls. c my answer below) – decltype_auto Dec 04 '15 at 08:02
  • I like fixing it in the pipe coz then my actual program is tidy. If I can switch to a better xml package in the future I'll just be able to drop the stuff in the wrapper. – Mark Harrison Dec 04 '15 at 08:10
  • Well - if you're already quite happy with your pipe - what exactly are we talking about then :)? – decltype_auto Dec 04 '15 at 08:21
  • lol, good question! I was hoping for an answer like "you dummy here's how to turn off the namespace wierdness" but in the absence of that I'm just hoping for the least bad alternative. For my case, that's keeping the python code clean and hiding the horrible horrible horrible code in the filter step. Although I'm trying hard to figure out how to deliver an lxml solution to my downstream peeps!! – Mark Harrison Dec 04 '15 at 16:20
  • But again - if you want it both as clean as possible now and as much invariant as possible with regard to replacing the xml module you import in the future, creating an adaption layer like I sketched it my answer is the most natural, if not the only, method. Best if it uses the xml module only but not inherits from it by any means, because the latter case you'd build your app around the to-be-replaced interface, whereas in the first, you'd per se populate an invariant interface tailored to you app. – decltype_auto Dec 04 '15 at 17:46
1

Here's the equivalent solution without using the shell. Basic idea:

  • translate <project junk...> to <project>
  • perform "clean" processing without worrying about the namespace
  • translate <project> back to <project junk...>

with the new code:

pom="""
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
</project>
"""
short_project="""<project>"""
long_project="""<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">"""

import re,sys
from xml.etree import ElementTree

# eliminate namespace specs
pom=re.compile('<project [^>]*>').sub(short_project,pom)

root = ElementTree.fromstring(pom)
ElementTree.dump(root)
print 1,root.findall('modelVersion')
print 2,root.findall('{http://maven.apache.org/POM/4.0.0}modelVersion')
mv=root.findall('modelVersion')

# restore the namespace specs
pom=ElementTree.tostring(root)
pom=re.compile(short_project).sub(long_project,pom)
Mark Harrison
  • 297,451
  • 125
  • 333
  • 465
0

Rather than ignore, another approach would be to remove the namespaces in the tree, so there's no need to 'ignore' because they aren't there - see nonagon's answer to this question (and my extension of that to include namespaces on attributes): Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall"

Community
  • 1
  • 1