I'm trying to parse an XML file HMDB the Saliva Metabolites
dataset into a list of dictionaries. Doing so, using the xmldict package. The format of the data and the output structure I try to create is in the first to code paragraphs in previews question posted.
this is the code :
# Import packages
import xml.etree.ElementTree as et
import xmltodict
# load data
data1 = et.parse('D:/path/To/Projects/HMDB/DataSets/saliva_metabolites/saliva_metabolites.xml')
root = data1.getroot()
xmlstr = et.tostring(root, encoding='utf-8', method='xml')
data_dict = dict(xmltodict.parse(xmlstr))
Now when trying to access specific keys like :
>> data_dict['ns0:hmdb']
Traceback (most recent call last):
File "<input>", line 1, in <module>
MemoryError
I'm using Pycharm and next to the object data_dict
noticed written: Unable to get repr for <class 'dict'>
Not sure what other info of my system needed besides:
>> print(sys.version)
3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)]
Any ideas, hints or clues will be appreciated