9

My XML looks like:

...
<termEntry id="c1">
    <langSet xml:lang="de">
    ...

And i have the code:

from lxml import etree
...

for term_entry in root.iterfind('.//termEntry'):
    print term_entry.attrib['id']
    print term_entry.nsmap

    for lang_set in term_entry.iterfind('langSet'):
        print lang_set.nsmap
        print lang_set.attrib

        for some_stuff in lang_set.iterfind('some_stuff'):
            ...

I get the empty nsmap dict, and my attrib dict looks like {'{http://www.w3.org/XML/1998/namespace}lang': 'en'}

The file may not contain xml: in namespace, or it may have a different namespace. How can i know what namespace used in the tag declaration? In fact, i just need to get a lang attribute, i don't care what namespace was used. I don't want use any crappy trash like lang_set.attrib.values()[0] or other lookups of a field with the known name.

night-crawler
  • 1,409
  • 1
  • 26
  • 39
  • Isn't finding the pair '{', '}' in the string and removing the delimited substring a simple-but-good solution for you ? – mmgp Dec 14 '12 at 02:48
  • 1
    Only if there'are no other solutions. I just want to know where is stored the `{http://www.w3.org/XML/1998/namespace}` string for attributes. It will be better to use `lang_set.attrib.get('{%s}lang' % namespace)`. – night-crawler Dec 14 '12 at 02:55
  • I'm not clear exactly what you're trying to do. Can you give a more explicit example? It seems like you already know how to request the `lang` attribute from the `http://www.w3.org/XML/1998/namespace` namespace; are you saying you also want to be able to get the `lang` attribute from other namespaces, too? Because that doesn't necessarily make sense. But maybe I've misunderstood the question. – larsks Dec 14 '12 at 03:01
  • "are you saying you also want to be able to get the lang attribute from other namespaces, too?" - yes. Namespaces may be different, so i just want to know their names. First time i parse file, next time i have to save a file with a similar syntax. It will not fail, if i'll save the file without namespaces, but.. It is really strange, if lxml does not provide any method for namespace processing. – night-crawler Dec 14 '12 at 03:14
  • Or is there a method to get the element's current namespace name with xpath? – night-crawler Dec 14 '12 at 03:15

2 Answers2

6

i just need to get a lang attribute, i don't care what namespace was used

Your question is not very clear and you haven't provided any complete runnable code example. But doing some string manipulation as suggested by @mmgp in a comment may be enough.

However, xml:lang is not the same as random_prefix:lang (or just lang). I think you should care about the namespace. If the objective is to identify the natural language that applies to an element's content, then you should be using xml:lang (because that is the explicit purpose of this attribute; see http://www.w3.org/TR/REC-xml/#sec-lang-tag).


I just want to know where is stored the {http://www.w3.org/XML/1998/namespace} string for attributes.

It is important to know that the xml prefix is special. It is reserved (as opposed to almost all other namespace prefixes which are supposed to be arbitrary) and defined to be bound to http://www.w3.org/XML/1998/namespace.

From the Namespaces in XML 1.0 W3C recommendation:

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

Other uses of the xml prefix are the xml:space and xml:base attributes.


It is really strange, if lxml does not provide any method for namespace processing

lxml processes namespaces just fine, but prefixes are avoided as much as possible. You will need to use the http://www.w3.org/XML/1998/namespace namespace name when doing lookups that involve the xml prefix.

mzjn
  • 48,958
  • 13
  • 128
  • 248
  • I've made my own analog of QName that returns prefix and localname. But i don't like string operations at all. Namespace/prefix - is one entity, and localname - is another. Here they are mixed, and i have to create another interface for attribute lookups without namespaces/prefixes. As for me, the `.attrib` should be a list-based and it should contain QName() instances for attributes. And methods for lookups, i.e. `get('attr_name', namespace=None)`, that should return a value of an attribute if it exists and throw MultipleAttributesException, if attribute lookup is ambiguous. – night-crawler Dec 15 '12 at 22:36
  • And here we have 1) attribute prefix resolving 2) concatenation of resolved prefix and attribute name 3) parsing of the concatenated string by regexp 4) couple of problems like my question. And it's not a first question about lxml namespaces on StackOverflow. It would be simpler to have a structure for attribute storing like `a.prefix, a.namespace, a.name, a.value`. But anyway, thanks for your answer, i'll close it. – night-crawler Dec 15 '12 at 22:43
6

you could simply use xpath:

lang_set.xpath('./@xml:lang')[0]

by the way, are you working with TBX files?

altipard
  • 101
  • 1
  • 6
  • 1
    To me it appears that it would be more like `lang_set.xpath('./@xml:lang', namespaces={'xml':'http://www.w3.org/XML/1998/namespace'})[0]` – mapto Dec 15 '20 at 05:40