is there another way to read huge XML dataset in python without memory problem, i used iteration too

Question

I have a dataset which is 70 GB. i already posted about how to read huge xml files. i tried iteration method but with this method the file reads fine upto 7-8 hours, after that the IDE closes. i tried with pycharm, anconda, and spider. i increased my RAM from 4 GB to 8 GB. is there another way to read this file full without any issue?

i increased my RAM from 4 to 8GB, and used iteration method, although somehow it works fine with this method but after reading dataset upto 7- 8 hours the IDE get closes and system get hangs.

here is my code i tried

import xml.etree.ElementTree as etree
for event, elem in etree.iterparse('Tags.xml', events=('start', 'end')):
    for rows in elem:
        count = count + 1
        print(rows.attrib)
elem.clear()

If that is your code, you did not correctly follow the example given in the answer to your previous question. The point is to clear the elements as you are working on them, not once the loop is complete. Look again at this answer https://stackoverflow.com/a/326541/9794932 and note where elem.clear is (hint: inside the loop). — Rob Bricheno, Aug 20 '19 at 16:43
Possible duplicate of [Efficient way to iterate through xml elements](https://stackoverflow.com/questions/4695826/efficient-way-to-iterate-through-xml-elements) — wwii, Aug 20 '19 at 16:54
But when I write elem.clear() inside loop it only prints me two lines . . @ROB Brincheno — waseem, Aug 24 '19 at 00:42

is there another way to read huge XML dataset in python without memory problem, i used iteration too

0 Answers0