2

I'm trying to parse an XML file HMDB the Saliva Metabolites dataset into a list of dictionaries. Doing so, using the xmldict package. The format of the data and the output structure I try to create is in the first to code paragraphs in previews question posted.

this is the code :

# Import packages
import xml.etree.ElementTree as et
import xmltodict

# load data
data1 = et.parse('D:/path/To/Projects/HMDB/DataSets/saliva_metabolites/saliva_metabolites.xml')
root = data1.getroot()

xmlstr = et.tostring(root, encoding='utf-8', method='xml')
data_dict = dict(xmltodict.parse(xmlstr))

Now when trying to access specific keys like :

>> data_dict['ns0:hmdb']
Traceback (most recent call last):
  File "<input>", line 1, in <module>
MemoryError

I'm using Pycharm and next to the object data_dict noticed written: Unable to get repr for <class 'dict'>

Not sure what other info of my system needed besides:

>> print(sys.version)
3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)]

Any ideas, hints or clues will be appreciated

TaL
  • 173
  • 2
  • 15
  • 1
    This really should mean the process (python) could not allocate the memory it asked for. It seems you're using 32b interpreter, so even if memory is available, single process should not be able to address more then 4GB -> If it's plausible it could be needed here, and your system otherwise still had available resources, try using 64b interpreter as the first step. – Ondrej K. Aug 30 '20 at 11:51

0 Answers0