35

This XML file is named example.xml:

<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>14.0.0</modelVersion>
  <groupId>.com.foobar.flubber</groupId>
  <artifactId>uberportalconf</artifactId>
  <version>13-SNAPSHOT</version>
  <packaging>pom</packaging>
  <name>Environment for UberPortalConf</name>
  <description>This is the description</description>    
  <properties>
      <birduberportal.version>11</birduberportal.version>
      <promotiondevice.version>9</promotiondevice.version>
      <foobarportal.version>6</foobarportal.version>
      <eventuberdevice.version>2</eventuberdevice.version>
  </properties>
  <!-- A lot more here, but as it is irrelevant for the problem I have removed it -->
</project>

If I load example.xml and parse it with ElementTree I can see its namespace is http://maven.apache.org/POM/4.0.0.

>>> from xml.etree import ElementTree
>>> tree = ElementTree.parse('example.xml')
>>> print tree.getroot()
<Element '{http://maven.apache.org/POM/4.0.0}project' at 0x26ee0f0>

I have not found a method to call to get just the namespace from an Element without resorting to parsing the str(an_element) of an Element. It seems like there got to be a better way.

Deleted
  • 1,351
  • 3
  • 14
  • 18

9 Answers9

33

This is a perfect task for a regular expression.

import re

def namespace(element):
    m = re.match(r'\{.*\}', element.tag)
    return m.group(0) if m else ''
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • 15
    After fighting for a while with this issue, this is the best solution I found. I can't believe that the API don't get you a way to ask for the namespace and, at the same time, it doesn't return the attribute 'xmlns' when doing 'rootElement.keys()'. Sure there is a good reason for that but I can't find it at this moment. – Robert Jul 09 '15 at 18:03
  • 1
    add `r` before regular exp please make this answer perfect. – JustWe Jul 31 '19 at 01:29
  • @Jiu thank you so much. I can't believe I missed that. – Mark Ransom Jul 31 '19 at 03:30
  • 1
    to get namespace without curly braces included: re.match(r'\{(.*)\}', element.tag).group(1) – Piotr Boho Jan 15 '21 at 14:45
  • After adding `r` before the regex the ``\``s are redundant. – FNia Jan 31 '22 at 21:14
  • @FNia no they're not. You need a way to tell the regex engine that you're looking for the literal character and it shouldn't try to interpret it. – Mark Ransom Jan 31 '22 at 22:14
  • @MarkRansom Removing the ``\``s certainly works (tried), but I apologize it has nothing to do with the `r`. It just works with curly braces in this case because they can't be interpreted any other way. Other special characters would need to be escaped. – FNia Feb 01 '22 at 04:35
27

The namespace should be in Element.tag right before the "actual" tag:

>>> root = tree.getroot()
>>> root.tag
'{http://maven.apache.org/POM/4.0.0}project'

To know more about namespaces, take a look at ElementTree: Working with Namespaces and Qualified Names.

Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
  • 2
    The link you've provided is dead, you might want to edit it to point to an alternative source for this piece of information. – M463 May 11 '21 at 20:01
12

I am not sure if this is possible with xml.etree, but here is how you could do it with lxml.etree:

>>> from lxml import etree
>>> tree = etree.parse('example.xml')
>>> tree.xpath('namespace-uri(.)')
'http://maven.apache.org/POM/4.0.0'
Jakub Roztocil
  • 15,930
  • 5
  • 50
  • 52
  • 1
    I get `unresolved import: etree` using Python 2.7.2 in Windows. `xpath` wasn´t available as a method when using `xml.etree` and if I use `find()` (which supports xpath expressions) the `'namespace-uri(.)'` statement still doesn´t work. – Deleted Mar 02 '12 at 14:55
  • this is exactly what i was looking for, [see pr on gh](https://github.com/samatjain/gpxsplitter/pull/3) – Andreas Scherer Sep 21 '15 at 09:43
  • This has been the best solution that I've seen. I normally use xmlstarlet but I may switch now. – Tandy Freeman Apr 22 '16 at 20:42
  • Why isn't this marked as answer? this is precisely what is asked for here! ... Except the part it is lxml and not xml... – Nebulosar Feb 26 '18 at 12:33
  • 4
    for `lxml` a simpler way to get the namespace is `tree.getroot().nsmap` – ccpizza Apr 11 '18 at 09:43
  • @ccpizza With [this example](https://stackoverflow.com/questions/18067800/xmlns-namespace-breaking-lxml) I have to use `tree.getroot().nsmap[None]`. Is the example misformatted ? – Jona Oct 25 '19 at 11:46
  • 1
    @Jona: I'd assume that using `None` is a way to address the default namespace, ie the one which is declared without a prefix. – ccpizza Oct 25 '19 at 18:06
10

Without using regular expressions:

>>> root
<Element '{http://www.google.com/schemas/sitemap/0.84}urlset' at 0x2f7cc10>

>>> root.tag.split('}')[0].strip('{')
'http://www.google.com/schemas/sitemap/0.84'
Lorcan
  • 101
  • 1
  • 4
2

The lxml.xtree library's element has a dictionary called nsmap, which shows all the namespace that are in use in the current tag scope.

>>> item = tree.getroot().iter().next()
>>> item.nsmap
{'md': 'urn:oasis:names:tc:SAML:2.0:metadata'}
Cypher
  • 89
  • 2
  • 10
1

The short answer is:

ElementTree._namspace_map[ElementTree._namspace_map.values().index('')]

but only if you have been calling

ElementTree.register_namespace(prefix,uri)

in response to every event=="start-ns" received while iterating through the result of

ET.iterparse(...) 

and you registered for "start-ns"

The answer the question "what is the default namespace?", it is necessary to clarify two points:

(1) XML specifications say that the default namespace is not necessarily global throughout the tree, rather the default namespace can be re-declared at any element under root, and inherits downwards until meeting another default namespace re-declaration.

(2) The ElementTree module can (de facto) handle XML-like documents which have no root default namespace, -if- they have no namespace use anywhere in the document. (* there may be less strict conditions, e.g., that is "if" and not necessarily "iff").

It's probably also worth considering "what do you want it for?" Consider that XML files can be semantically equivalent, but syntactically very different. E.g., the following three files are semantically equivalent, but A.xml has one default namespace declaration, B.xml has three, and C.xml has none.

A.xml:
<a xlmns="http://A" xlmns:nsB0="http://B0" xlmns:nsB1="http://B1">
     <nsB0:b/>
     <nsB1:b/>
</a>

B.xml:
<a xlmns="http://A">
     <b xlmns="http://B0"/>
     <b xlmns="http://B1"/>
</a>

C.xml:
<{http://A}a>
     <{http://B0}b/>
     <{http://B1}b/>
</a>

The file C.xml is the canonical expanded syntactical representation presented to the ElementTree search functions.

If you are certain a priori that there will be no namespace collisions, you can modify the element tags while parsing as discussed here: Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall"

Community
  • 1
  • 1
Craig Hicks
  • 2,199
  • 20
  • 35
1

I think it will be easier to take a look at the attributes:

>>> root.attrib
{'{http://www.w3.org/2001/XMLSchema-instance}schemaLocation':
   'http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd'}
jcollado
  • 39,419
  • 8
  • 102
  • 133
  • Certainly easier than parsing `str(the_element)`. But I guess parsing `the_element.tag` is even a bit easier. As I am only interested in the namespace. What do you think? – Deleted Mar 02 '12 at 15:13
  • 1
    I think that @RikPoggi's answer seems the best one (actually, I upvoted it). In fact, getting the namespace should be as easy as `re.search('\{(.*)\}', the_element.tag).group(1)`. With my answer it looks you could use `the_element.attrib.values()[0].split()[0]`, but, indeed, it doesn't look so much straightforward and it isn't guaranteed that you won't get any other attributes in the future. – jcollado Mar 02 '12 at 15:21
-1

combining some of the answers above, I think the shortest code is

theroot = tree.getroot()
theroot.attrib[theroot.keys()[0]]
  • 2
    This is not accurate, because the xmlns might not be the first attribute of the root. In fact, I'm currently trying to parse a TCX file and the xmlns isn't showing up as an attribute of the root at all. – Mitch Lindgren Jan 23 '20 at 23:26
-1

Here is my solution on ElementTree 3.9+,

def get_element_namespaces(filename, element):
    namespace = []
    for key, value in ET.iterparse(filename, events=['start', 'start-ns']):
        print(key, value)
        if key == 'start-ns':
            namespace.append(value)
        else:
            if ET.tostring(element) == ET.tostring(value):
                return namespace
            namespace = []
    return namespaces

This would return an array of [prefix:URL] tuples like this:

[('android', 'http://schemas.android.com/apk/res/android'), ('tools', 'http://schemas.android.com/tools')]
amrezzd
  • 1,787
  • 15
  • 38