2

I'm trying to parse an XML document with a default namespace, i.e. the root node has an xmlns attribute. This is annoying if you want to try find certain tags in the child nodes because each tag is prefixed with the default namespace.

xml.etree.ElementTree.findall() allows for a namespaces dictionary to be passed in but I can't seem to find what the default namespace is mapped to. I have tried using 'default', None, 'xmlns' with no success.

One option that does seem to work is to prefix the tag passed to findall() with 'xmlns:' (EDIT: this can be any arbitrary unique name actually) and a corresponding entry in the namespaces dictionary but I'm wondering if that is even necessary.

EDIT: I should mention this is Python 3.3.2. I believe in older versions of Python, findall() does not accept a namespaces argument.

Alex
  • 428
  • 4
  • 14
  • possible duplicate of [Parsing XML with namespace in Python ElementTree](http://stackoverflow.com/questions/14853243/parsing-xml-with-namespace-in-python-elementtree). (answer appears to be "yes, necessary") – torek Aug 11 '13 at 12:16

1 Answers1

2

Decided to take a look at the source code for the method. So as it turns out, the following code in xml.etree.ElementPath is doing the dirty work:

def xpath_tokenizer(pattern, namespaces=None):
    for token in xpath_tokenizer_re.findall(pattern):
        tag = token[1]
        if tag and tag[0] != "{" and ":" in tag:
            try:
                prefix, uri = tag.split(":", 1)
                if not namespaces:
                    raise KeyError
                yield token[0], "{%s}%s" % (namespaces[prefix], uri)
            except KeyError:
                raise SyntaxError("prefix %r not found in prefix map" % prefix)
        else:
            yield token

pattern is the tag passed into findall(). If there is no : found in the tag, the tokenizer simply returns the tag back without any substitutions.

Alex
  • 428
  • 4
  • 14