0

I've downloaded this XML file.

I'm trying to get includingNote as follows:

...
namespaces = { "skos" : "http://www.w3.org/2004/02/skos/core#", "xml" : "http://www.w3.org/XML/1998/namespace", 
                 "udc" : "http://udcdata.info/udc-schema#" }
...


includingNote = child.find("udc:includingNote[@xml:lang='en']", namespaces)
if includingNote:
  print includingNote.text.encode("utf8")

The scheme is located here and seems to be corrupted.

Is there a way I can print includingNote for each child node.

xralf
  • 3,312
  • 45
  • 129
  • 200

1 Answers1

1

It is true that the skos prefix is not declared in udc-scheme, but searching the XML document is not a problem.

The following program extracts 639 includingNote elements:

from xml.etree import cElementTree as ET

namespaces = {"udc" : "http://udcdata.info/udc-schema#",
              "xml" : "http://www.w3.org/XML/1998/namespace"}

doc = ET.parse("udcsummary-skos.rdf")
includingNotes = doc.findall(".//udc:includingNote[@xml:lang='en']", namespaces)

print len(includingNotes)   # 639

for i in includingNotes:
    print i.text

Note the use of findall() and .// in front of the element name in order to search the whole document.


Here is a variant that returns the same information by first finding all Concept elements:

from xml.etree import cElementTree as ET

namespaces = {"udc" : "http://udcdata.info/udc-schema#",
              "skos" : "http://www.w3.org/2004/02/skos/core#",
              "xml" : "http://www.w3.org/XML/1998/namespace"}

doc = ET.parse("udcsummary-skos.rdf")
concepts = doc.findall(".//skos:Concept", namespaces)

for c in concepts:
    includingNote = c.find("udc:includingNote[@xml:lang='en']", namespaces)
    if includingNote is not None:
        print includingNote.text

Note the use of is not None. Without that, it does not work. This seems to be a peculiarity of ElementTree. See Why does bool(xml.etree.ElementTree.Element) evaluate to False?.

Community
  • 1
  • 1
mzjn
  • 48,958
  • 13
  • 128
  • 248