It is true that the skos
prefix is not declared in udc-scheme, but searching the XML document is not a problem.
The following program extracts 639 includingNote
elements:
from xml.etree import cElementTree as ET
namespaces = {"udc" : "http://udcdata.info/udc-schema#",
"xml" : "http://www.w3.org/XML/1998/namespace"}
doc = ET.parse("udcsummary-skos.rdf")
includingNotes = doc.findall(".//udc:includingNote[@xml:lang='en']", namespaces)
print len(includingNotes) # 639
for i in includingNotes:
print i.text
Note the use of findall()
and .//
in front of the element name in order to search the whole document.
Here is a variant that returns the same information by first finding all Concept
elements:
from xml.etree import cElementTree as ET
namespaces = {"udc" : "http://udcdata.info/udc-schema#",
"skos" : "http://www.w3.org/2004/02/skos/core#",
"xml" : "http://www.w3.org/XML/1998/namespace"}
doc = ET.parse("udcsummary-skos.rdf")
concepts = doc.findall(".//skos:Concept", namespaces)
for c in concepts:
includingNote = c.find("udc:includingNote[@xml:lang='en']", namespaces)
if includingNote is not None:
print includingNote.text
Note the use of is not None
. Without that, it does not work. This seems to be a peculiarity of ElementTree. See Why does bool(xml.etree.ElementTree.Element) evaluate to False?.