0

I am not able to parse XML file of huge size using lxml tree. What I came to know from my research is that lxml iterparse loads the xml file until it gets tag which it is looking for. This is snippet of my code :-

for event, child in etree.iterparse(xml_file,tag='test'):
        print(sys.getsizeof(child))

It is not even reaching print statement and is getting killed. Any help on this matter?

Klaus D.
  • 13,874
  • 5
  • 41
  • 48
  • XML is a tree structure and to parse it properly you need a stateful parser. That requires a lot of resources. While `iterparse` is more memory efficient than other parsing strategies it will still require a lot of RAM to parse 3 GB. Or seen from the other perspective: it's not a great idea to create huge XML files. – Klaus D. May 15 '20 at 07:34
  • I am running above code on server. I understand that it will require lot of RAM, but is there any way to parse such huge file? – Prit Modi May 15 '20 at 08:12
  • Another useful answer: https://stackoverflow.com/a/42193997/1566221 – rici May 15 '20 at 14:25

0 Answers0