3

I have a very large (1.8GB) XML document. I'd like to simply find the number of elements with the tag <Product>.

I've got this far:

context = etree.iterparse('./test.xml', tag='Product')
num_elems = 0
for event, elem in context:
    num_elems += 1
print num_elems

It works, but is there a faster way of doing it?

flossfan
  • 10,554
  • 16
  • 42
  • 53
  • I think you're on the right track it's just that you have a large xml file. http://stackoverflow.com/questions/324214/what-is-the-fastest-way-to-parse-large-xml-docs-in-python – Bob May 22 '12 at 13:48

1 Answers1

1

Since this works, I take it that memory use is not an issue (iterparse will build a tree of the entire file in memory unless you prune it while iterating over the elements). In that case, save yourself the trouble of iterating and counting in Python and let LXML/libxml handle that in C:

tree = etree.parse("./test.xml")
num_elems = tree.xpath("count(//Product)")    # note: returns a float
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Is this a better way, than the following code: `tree = etree.parse("./test.xml"); products = tree.findall("Product"); num_elems = len(products) if products is not None else None`? – Petr Krampl Jul 25 '18 at 14:44