Efficiently counting elements in a very large XML doc using lxml

Question

I have a very large (1.8GB) XML document. I'd like to simply find the number of elements with the tag <Product>.

I've got this far:

context = etree.iterparse('./test.xml', tag='Product')
num_elems = 0
for event, elem in context:
    num_elems += 1
print num_elems

It works, but is there a faster way of doing it?

I think you're on the right track it's just that you have a large xml file. http://stackoverflow.com/questions/324214/what-is-the-fastest-way-to-parse-large-xml-docs-in-python — Bob, May 22 '12 at 13:48

score 1 · Accepted Answer · answered May 22 '12 at 13:59

1

Since this works, I take it that memory use is not an issue (iterparse will build a tree of the entire file in memory unless you prune it while iterating over the elements). In that case, save yourself the trouble of iterating and counting in Python and let LXML/libxml handle that in C:

tree = etree.parse("./test.xml")
num_elems = tree.xpath("count(//Product)")    # note: returns a float

answered May 22 '12 at 13:59

Fred Foo

355,277
75
744
836

Is this a better way, than the following code: `tree = etree.parse("./test.xml"); products = tree.findall("Product"); num_elems = len(products) if products is not None else None`? – Petr Krampl Jul 25 '18 at 14:44

Efficiently counting elements in a very large XML doc using lxml

1 Answers1