How to parse XML file in chunks

Question

I have a very large XML file with 40,000 tag elements. When i am using element tree to parse this file it's giving errors due to memory. So is there any module in python that can read the xml file in data chunks without loading the entire xml into memory?And How i can implement that module?

I'm no pythonist, but look for a SAX (not DOM) aproach for parsing XMLs. — Mārtiņš Briedis, Feb 12 '12 at 13:44
SAX is perfect as long as the problem doesn't require random access to the tags. If that's not the case, you can still use it if there's a way to build a more compact representation of the data in memory. — Ricardo Cárdenes, Feb 12 '12 at 13:50

score 8 · Accepted Answer · edited Feb 12 '12 at 13:52

8

Probably the best library for working with XML in Python is lxml, in this case you should be interested in iterparse/iterwalk.

edited Feb 12 '12 at 13:52

Niklas B.

92,950
18
194
224

answered Feb 12 '12 at 13:50

Zach Kelling

52,505
13
109
108

2

http://stackoverflow.com/questions/7171140/using-python-iterparse-for-large-xml-files this is worth noting when working with big XML files. – Gareth Latty Feb 12 '12 at 13:58

score 2 · Answer 2 · answered Feb 12 '12 at 22:57

This is a problem that people usually solve using sax.

If your huge file is basically a bunch of XML documents aggregated inside and overall XML envelope, then I would suggest using sax (or plain string parsing) to break it up into a series of individual documents that you can then process using lxml.etree.

How to parse XML file in chunks

2 Answers2