Python attribute parsing returns None for xml:id

Question

I am trying to extract some information out of a tei file, using this code:

tree = ET.parse(path)
root = tree.getroot()
body = root.find("{http://www.tei-c.org/ns/1.0}text/{http://www.tei-c.org/ns/1.0}body")  
for s in body.iter("{http://www.tei-c.org/ns/1.0}s"):
    for w in s.iter("{http://www.tei-c.org/ns/1.0}w"):
        wordpart = w.find("{http://www.tei-c.org/ns/1.0}seg")
        word = ''.join(wordpart.itertext())
        type = w.get('type')
        xml = w.get('xml:id') 
        print(type)             
        print(xml)

The output for type is correct, it prints e.g. "noun". But for xml:id I can only get None. This is an extract of the xml-file I need to parse:

<w type="noun" xml:id="w.4940"><seg type="orth">sloterheighe</seg>...

Why are there two quotation marks at the end of `xml:id="w.4940""`? — glhr, Apr 30 '19 at 10:33

glhr · Accepted Answer · 2019-04-30T13:28:51.447

2

To get the value of the xml:id attribute, you need to specify the namespace URI like this (see this SO post for more details):

xml = w.attrib['{http://www.w3.org/XML/1998/namespace}id']

or

xml = w.get('{http://www.w3.org/XML/1998/namespace}id')

Also, note that type is a built-in method in Python, so avoid using it as a variable name.

edited Apr 30 '19 at 13:28

answered Apr 30 '19 at 13:21

glhr

4,439
1
15
26

Python attribute parsing returns None for xml:id

1 Answers1