python ElementTree xml: parsing fromstring vs building elements

Question

Given a large array of numbers where:

[1, 2, 3, 4 ...] => <tag attrib="1" />
                    <tag attrib="2" />
                    <tag attrib="3" />
                    <tag attrib="4" />
                    ...

Which is more efficient/fast:

a) building them from scratch using Element("name", attributes) and appending them to some root

or

b) fromstring(str) where str is the string representation of those tags in the example

I would guess `(a)`, as `(b)` parses the text back into an `Element` and then applies `(a)` to it to return your final result. — Blender, Jul 11 '12 at 22:33
sorry if it wasn't very clear. All i have is that array of numbers and I want the Elements at the end. What b) was suggest of doing is iterating over the array, build the string '' ... for all the numbers and then pass that final string to be parsed — WindowsMaker, Jul 11 '12 at 22:46
Yep. `b)` will end up building a string that will be parsed into into `Element`s, but `a)` will build the `Element`s directly. `a)` is one step shorter. — Blender, Jul 11 '12 at 22:47

score 0 · Accepted Answer · edited May 23 '17 at 10:09

0

updated test:

from xml.etree.ElementTree import Element, XML, tostring
from timeit import timeit

elist = [e for e in xrange(1000)]

def test_normal():
    eroot = Element('root')
    for e in elist:
        eroot.append(Element("tag", {"attrib" :"%s" %e}))

def test_list():
    eroot = Element('root')
    [eroot.append(Element("tag", {"attrib" :"%s" %e})) for e in elist]


print "etree: %.6f" %timeit(test_normal, number=1000)
print "l-cmp: %.6f" %timeit(test_list, number=1000)

from xml.etree.cElementTree import Element, XML, tostring, fromstring
#from lxml.etree import Element, XML, tostring

print "ctree: %.6f" %timeit(test_normal, number=1000)
print "c-cmp: %.6f" %timeit(test_list, number=1000)

def test_string():
    eroot = "<root>"
    tags = ['<tag attrib="%s" />' %e for e in elist]
    eroot += ' '.join(tags) + '</root>'
    tree = fromstring(eroot)

print "strng: %.6f" %timeit(test_string, number=1000)    


etree: 13.302093
l-cmp: 12.276725
ctree: 5.482961
c-cmp: 5.692758
strng: 6.578780

The cElementTree is the fastest version. So i would say: don't use the string abomination! ;-)

edited May 23 '17 at 10:09

Community

1
1

answered Jul 11 '12 at 23:59

Don Question

11,227
5
36
54

Did you try Subelement? It was minimal slower then append in my setup, i just didn't include it here. list-comp just so nobody claims i did ignore it's performance gains. where did i forget fromstring? in test_list() ? – Don Question Jul 12 '12 at 00:25
1

you forgot to call fromstring in the 3rd example. Don't use listcomp for side-effects. – jfs Jul 12 '12 at 00:25
I've removed mentioning of SubElement (it is not documented in the stdlib's docs). I don't know whether there is a performance difference. – jfs Jul 12 '12 at 00:34
1

10 is not a large number (see the first sentence in the question). Try your benchmark with larger numbers. You could run time it as `python -mtimeit -s 'from m import test_list as f' 'f()'` – jfs Jul 12 '12 at 00:39
ihh, didnt look into the docs. im still using python 2.x mostly so i cant say if SubElement is gone in 3. but in 2.6 2.7 it's still there, but a little bit slower (but maybe it's just a fluke) – Don Question Jul 12 '12 at 00:41
there's no change in magnitude between the different approaches, but the default of 10^6 takes ages for a quick&dirty orienntation ;-) – Don Question Jul 12 '12 at 00:52
@J.F.Sebastian: Yes, but it proves that fromstring is incredibly fast if you don't actually call it. :) – abarnert Jul 12 '12 at 01:29
;-) ahhh you mean the b) from the OP! lol - i just ignored it totaly XD – Don Question Jul 12 '12 at 01:35

python ElementTree xml: parsing fromstring vs building elements

1 Answers1