25

How can one access NS attributes through using ElementTree?

With the following:

<data xmlns="http://www.foo.net/a" xmlns:a="http://www.foo.net/a" book="1" category="ABS" date="2009-12-22">

When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated..

Melchior
  • 251
  • 1
  • 3
  • 3
  • 3
    I can't answer your question - but having struggled against this shortcoming for a couple of days I'm prepared to claim that it isn't possible using with the current ElementTree API. In my application I needed to detect whether an xmlns:xlink attribute already existed on the root element, and if not, add it. It's not possible to test whether an xmlns attribute already exists and what is more, ElementTree is happy to add it twice if you try. Since either zero or two identical xmlns attributes in the same element cause an error in most XML consumers this make ElementTree very difficult to use. – Rob Smallshire Aug 17 '11 at 19:02
  • This is a very relevant answer now: [from 2017 timeframe](https://stackoverflow.com/a/42372404/13719735) – analytical_prat Aug 04 '22 at 12:32

3 Answers3

18

I think element.tag is what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example.

>>> from xml.etree import ElementTree as ET
>>> data = '''<data xmlns="http://www.foo.net/a"
...                 xmlns:a="http://www.foo.net/a"
...                 book="1" category="ABS" date="2009-12-22"/>'''
>>> element = ET.fromstring(data)
>>> element
<Element {http://www.foo.net/a}data at 1013b74d0>
>>> element.tag
'{http://www.foo.net/a}data'
>>> element.attrib
{'category': 'ABS', 'date': '2009-12-22', 'book': '1'}

If you just want to know the xmlns URI, you can split it out with a function like:

def tag_uri_and_name(elem):
    if elem.tag[0] == "{":
        uri, ignore, tag = elem.tag[1:].partition("}")
    else:
        uri = None
        tag = elem.tag
    return uri, tag

For much more on namespaces and qualified names in ElementTree, see effbot's examples.

gitaarik
  • 42,736
  • 12
  • 98
  • 105
Jeffrey Harris
  • 3,480
  • 25
  • 30
  • 20
    Why is there not a function like this in the library? It seems like every xml file with a namespace would need it. Am I missing it? – Clutch May 03 '10 at 14:57
  • @clutch I am wondering the same thing. Anyone know a reason why? – Santa Aug 23 '10 at 21:13
  • @rednaw, I'm not convinced split is better. Partition is guaranteed to return a tuple of exactly three elements, split can return an arbitrary number of elements. In practice it would be syntactically invalid to have anything but one closing curly brace, but still. I think partition is better. – Jeffrey Harris Jun 30 '17 at 19:48
14

Look at the effbot namespaces documentation/examples; specifically the parse_map function. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.

However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.

Here's what I came up with:

import elementtree.ElementTree as ET

def parse_and_get_ns(file):
    events = "start", "start-ns"
    root = None
    ns = {}
    for event, elem in ET.iterparse(file, events):
        if event == "start-ns":
            if elem[0] in ns and ns[elem[0]] != elem[1]:
                # NOTE: It is perfectly valid to have the same prefix refer
                #     to different URI namespaces in different parts of the
                #     document. This exception serves as a reminder that this
                #     solution is not robust.    Use at your own peril.
                raise KeyError("Duplicate prefix with different URI found.")
            ns[elem[0]] = "{%s}" % elem[1]
        elif event == "start":
            if root is None:
                root = elem
    return ET.ElementTree(root), ns

With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):

<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"\
>
<feed>
  <item>
    <title>Foo</title>
    <dc:creator>Joe McGroin</dc:creator>
    <description>etc...</description>
  </item>
</feed>
</rss>

You will be able to use the xml namepaces and get info for elements like dc:creator:

>>> tree, ns = parse_and_get_ns("my.xml")
>>> ns
{u'content': '{http://purl.org/rss/1.0/modules/content/}',
u'dc': '{http://purl.org/dc/elements/1.1/}'}
>>> item = tree.find("/feed/item")
>>> item.findtext(ns['dc']+"creator")
'Joe McGroin'
Jordan Reiter
  • 20,467
  • 11
  • 95
  • 161
deancutlet
  • 581
  • 6
  • 6
  • You answered my post at http://stackoverflow.com/questions/13018024/converting-my-python-script-from-lxml-to-xml-etree/13019393#13019393 – Fletcher Moore Oct 22 '12 at 20:35
  • I found a small bug in your code. I fixed it by setting `ns[elem[0]]` to `elem[1]` inside the for loop, because ET namespace dicts don't need the braces. – samwyse Sep 01 '18 at 15:48
1

Try this:

import xml.etree.ElementTree as ET
import re
import sys

with open(sys.argv[1]) as f:
    root = ET.fromstring(f.read())
    xmlns = ''
    m = re.search('{.*}', root.tag)
    if m:
        xmlns = m.group(0)
    print(root.find(xmlns + 'the_tag_you_want').text)
Garcia Sylvain
  • 356
  • 4
  • 10