0

I am trying to read XML file "Comments.xml" which is almost 19 GB, but it gives memory error. I tried it in every IDE, but nothing works. i have 4 GB ram. i also searched, googled but did not find any clue :/ my code is

I tried it in every IDE, but nothing works. i also searched, googled but did not find any clue :/

import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='Comments.xml')
root = tree.getroot()
for rows in root:
    print(rows.attrib)

when i run the IDE stick for some time then i get the error: line 598, in parse self._root = parser._parse_whole(source) MemoryError

waseem
  • 116
  • 1
  • 13
  • You can't read it in as one shot, so you'll need a streaming XML parser. – tadman May 30 '19 at 18:26
  • streaming xml? how will i get that? – waseem May 30 '19 at 18:40
  • 1
    I'm not sure it will completely solve your problem but you can try the `lxml` python library which allows *iterative parsing* (so the whole tree doesn't need to be loaded in memory). See some answers of https://stackoverflow.com/q/324214/5050917 for some examples. – mgc May 30 '19 at 18:40

1 Answers1

0

i met this problem too this morning, and thankfully i found the best solution: The xmltodict

to avoid taking huge memory, you can use streaming mode, here is the exmaple:

def handle_artist(_, artist):
    print(artist['name'])
    return True

xmltodict.parse(GzipFile('discogs_artists.xml.gz'),item_depth=2, item_callback=handle_artist)
  • I used xmltodict but gives the error " parser.Parse(xml_input, True) ExpatError: syntax error: line 1, column 0", I have import xmltodict def handle_artist(_, artist): print(artist['person']) return True xmltodict.parse('activity.xml',item_depth=2, item_callback=handle_artist) – Zahra Dec 13 '21 at 21:29