In answering another question, someone showed me the following tutorial, in which the author claims to have used iterparse to parse a ~100 MB XML file in under 3 seconds:
http://eli.thegreenplace.net/2012/03/15/processing-xml-in-python-with-elementtree/
I am trying to parse an ~90 MB XML file, and I have the following code:
from xml.etree.cElementTree import *
count = 0
for event, elem in iterparse('foo.xml'):
if elem.tag == 'identifier' and elem.text == 'bar':
count += 1
elem.clear() # discard the element
print count
It is taking about thirty seconds... not even the same order of magnitude as reported in the tutorial I read using a similarly sized file, a similar algorithm, and the same package.
Could someone please inform me what might be wrong with my code, or what differences I might not be noticing between my situation and the tutorial?
I am using Python 2.7.3.
Addendum:
I am also using a reasonably powerful machine, in case anyone thinks that might be it.