0

I am trying to use the following Script but there have been some errors that I've needed to fix. I was able to get it running but for most instances of the data it tries to process the following error arises:

C:\Users\Alexa\OneDrive\Skrivbord\Database Samples\LIDC-IDRI-0001\1.3.6.1.4.1.14519.5.2.1.6279.6001.175012972118199124641098335511\1.3.6.1.4.1.14519.5.2.1.6279.6001.141365756818074696859567662357\068.xml
Traceback (most recent call last):
  File "C:\Users\Alexa\OneDrive\Documents\Universitet\Nuvarande\KEX\LIDC-IDRI-processing-master\lidc_data_to_nifti.py", line 370, in <module>
    parse_xml_file(xml_file)
  File "C:\Users\Alexa\OneDrive\Documents\Universitet\Nuvarande\KEX\LIDC-IDRI-processing-master\lidc_data_to_nifti.py", line 311, in parse_xml_file
    root=xmlHelper.create_xml_tree(file)
  File "C:\Users\Alexa\OneDrive\Documents\Universitet\Nuvarande\KEX\LIDC-IDRI-processing-master\lidcXmlHelper.py", line 23, in create_xml_tree
    for at in el.attrib.keys(): # strip namespaces of attributes too
RuntimeError: dictionary keys changed during iteration

This corresponds to the following code:

def create_xml_tree(filepath):
    """
    Method to ignore the namespaces if ElementTree is used. 
    Necessary becauseElementTree, by default, extend
    Tag names by the name space, but the namespaces used in the
    LIDC-IDRI dataset are not consistent. 
    Solution based on https://stackoverflow.com/questions/13412496/python-elementtree-module-how-to-ignore-the-namespace-of-xml-files-to-locate-ma
    
    instead of ET.fromstring(xml)
    """
    it = ET.iterparse(filepath)
    for _, el in it:
        if '}' in el.tag:
            el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
        for at in el.attrib.keys(): # strip namespaces of attributes too
            if '}' in at:
                newat = at.split('}', 1)[1]
                el.attrib[newat] = el.attrib[at]
                del el.attrib[at]
    return it.root

I am not at all familiar with xml file reading in python and this problem has gotten me stuck for the last two days now. I tried reading up on threads both here and on other forums but it did not give me significant insight. From what I understand the problem arises because we're trying to manipulate the same object we are reading from, which is not allowed? I tried to fix this by making a copy of the file and then having that change depending on what is read in the original file but I did not get it working properly.

Don_twice
  • 41
  • 1
  • 6
  • One option could be to use a wildcard for the namespace instead of removing it. See https://stackoverflow.com/a/62117710/407651 – mzjn Mar 24 '22 at 07:52
  • Thank you for the input! Would this result in the same elementTree being returned from the function? Or would this manipulate the result in such a way that it's different? – Don_twice Mar 24 '22 at 07:55
  • It would mean accepting that there is at least one namespace instead of trying to get rid of it. On the other hand you might also be interested in this: https://stackoverflow.com/q/5384914/407651 – mzjn Mar 24 '22 at 08:05

0 Answers0