Find nodes defined in corrupted namespace

Question

I've downloaded this XML file.

I'm trying to get includingNote as follows:

...
namespaces = { "skos" : "http://www.w3.org/2004/02/skos/core#", "xml" : "http://www.w3.org/XML/1998/namespace", 
                 "udc" : "http://udcdata.info/udc-schema#" }
...


includingNote = child.find("udc:includingNote[@xml:lang='en']", namespaces)
if includingNote:
  print includingNote.text.encode("utf8")

The scheme is located here and seems to be corrupted.

Is there a way I can print includingNote for each child node.

score 1 · Accepted Answer · edited May 23 '17 at 12:00

It is true that the skos prefix is not declared in udc-scheme, but searching the XML document is not a problem.

The following program extracts 639 includingNote elements:

from xml.etree import cElementTree as ET

namespaces = {"udc" : "http://udcdata.info/udc-schema#",
              "xml" : "http://www.w3.org/XML/1998/namespace"}

doc = ET.parse("udcsummary-skos.rdf")
includingNotes = doc.findall(".//udc:includingNote[@xml:lang='en']", namespaces)

print len(includingNotes)   # 639

for i in includingNotes:
    print i.text

Note the use of findall() and .// in front of the element name in order to search the whole document.

Here is a variant that returns the same information by first finding all Concept elements:

from xml.etree import cElementTree as ET

namespaces = {"udc" : "http://udcdata.info/udc-schema#",
              "skos" : "http://www.w3.org/2004/02/skos/core#",
              "xml" : "http://www.w3.org/XML/1998/namespace"}

doc = ET.parse("udcsummary-skos.rdf")
concepts = doc.findall(".//skos:Concept", namespaces)

for c in concepts:
    includingNote = c.find("udc:includingNote[@xml:lang='en']", namespaces)
    if includingNote is not None:
        print includingNote.text

Note the use of is not None. Without that, it does not work. This seems to be a peculiarity of ElementTree. See Why does bool(xml.etree.ElementTree.Element) evaluate to False?.

Well, it works for me. Please provide more details. What version of Python do you use? I use 2.7.12. — mzjn, Oct 02 '16 at 07:57

Find nodes defined in corrupted namespace

1 Answers1