I am trying to parse XML, where the URI for the same namespace is not using the same case. (some xml owners decided to lower-case URIs). If I parse data with one type of URI followed by data with the other type, the parser fail finding my data although I update the ns dictionary to match the document URI... Here is an example:
from cStringIO import StringIO
import xml.etree.ElementTree as ET
DATA_lc = '''<?xml version="1.0" encoding="utf-8"?>
<container xmlns:roktatar="http://www.example.com/lower/case/bug">
<item>
<roktatar:author>Boby Mac Gallinger</roktatar:author>
</item>
</container>'''
DATA_UC = '''<?xml version="1.0" encoding="utf-8"?>
<container xmlns:roktatar="http://www.example.com/Lower/Case/Bug">
<item>
<roktatar:author>John-John Le Grandiosant</roktatar:author>
</item>
</container>'''
tree = ET.parse(StringIO(DATA_lc))
root = tree.getroot()
ns = {'roktatar': 'http://www.example.com/lower/case/bug'}
for item in root.iter('item'):
print item.find('roktatar:author', namespaces=ns).text.strip()
tree = ET.parse(StringIO(DATA_UC))
root = tree.getroot()
ns = {'roktatar': 'http://www.example.com/Lower/Case/Bug'}
for item in root.iter('item'):
print item.find('roktatar:author', namespaces=ns).text.strip()
If each parsing block is processed on it's own, the data gets collected properly, but if they come next to each others, the second always fail. I am missing so reset/cleaning of the parser between documents? Is this a Bug?
Thanks