I answered several questions here by using this to "flatten" a list of lists:
>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])
it works fine and yields:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
although I was told that the sum
operator does a = a + b
which is not as performant as itertools.chain
My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum
and itertools.chain.from_iterable
on the same data:
import itertools,timeit
print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
I did that several times and I always get about the same figures as below:
0.7155522836070246
0.9883352857722025
To my surprise, chain
- recommended over sum
for lists by everyone in several comments on my answers - is much slower.
It's still interesting when iterating in a for
loop because it doesn't actually create the list, but when creating the list, sum
wins.
So should we drop itertools.chain
and use sum
when the expected result is a list
?
EDIT: thanks to some comments, I made another test by increasing the number of lists
s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))
now I get the opposite:
6.479897810702537
3.793455760814343