I have a very large XML file with 40,000 tag elements. When i am using element tree to parse this file it's giving errors due to memory. So is there any module in python that can read the xml file in data chunks without loading the entire xml into memory?And How i can implement that module?
Asked
Active
Viewed 4,129 times
7
-
2I'm no pythonist, but look for a SAX (not DOM) aproach for parsing XMLs. – Mārtiņš Briedis Feb 12 '12 at 13:44
-
3SAX is perfect as long as the problem doesn't require random access to the tags. If that's not the case, you can still use it if there's a way to build a more compact representation of the data in memory. – Ricardo Cárdenes Feb 12 '12 at 13:50
-
lxml is best.. recommended n use by IBM as well :) – codersofthedark Mar 14 '12 at 05:53
2 Answers
8
Probably the best library for working with XML in Python is lxml
, in this case you should be interested in iterparse
/iterwalk
.

Niklas B.
- 92,950
- 18
- 194
- 224

Zach Kelling
- 52,505
- 13
- 109
- 108
-
2http://stackoverflow.com/questions/7171140/using-python-iterparse-for-large-xml-files this is worth noting when working with big XML files. – Gareth Latty Feb 12 '12 at 13:58
2
This is a problem that people usually solve using sax.
If your huge file is basically a bunch of XML documents aggregated inside and overall XML envelope, then I would suggest using sax (or plain string parsing) to break it up into a series of individual documents that you can then process using lxml.etree.

Michael Dillon
- 31,973
- 6
- 70
- 106