8

I answered several questions here by using this to "flatten" a list of lists:

>>> l = [[1,2,3],[4,5,6],[7,8,9]]
>>> sum(l,[])

it works fine and yields:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

although I was told that the sum operator does a = a + b which is not as performant as itertools.chain

My planned question was "why is it possible on lists where it is prevented on strings", but I made a quick benchmark on my machine comparing sum and itertools.chain.from_iterable on the same data:

import itertools,timeit

print(timeit.timeit("sum(l,[])",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup='l = [[1,2,3],[4,5,6],[7,8,9]]'))

I did that several times and I always get about the same figures as below:

0.7155522836070246
0.9883352857722025

To my surprise, chain - recommended over sum for lists by everyone in several comments on my answers - is much slower.

It's still interesting when iterating in a for loop because it doesn't actually create the list, but when creating the list, sum wins.

So should we drop itertools.chain and use sum when the expected result is a list ?

EDIT: thanks to some comments, I made another test by increasing the number of lists

s = 'l = [[4,5,6] for _ in range(20)]'
print(timeit.timeit("sum(l,[])",setup=s))
print(timeit.timeit("list(itertools.chain.from_iterable(l))",setup=s))

now I get the opposite:

6.479897810702537
3.793455760814343
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219

2 Answers2

15

Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum version isn't visible. The timings are dominated by constant factors, and sum has a better constant factor, since it doesn't have to work through iterators.

With bigger lists, it becomes clear that sum is not at all designed for this kind of thing:

>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
...               'l = [[i] for i in xrange(5000)]; import itertools',
...               number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097
user2357112
  • 260,549
  • 28
  • 431
  • 505
9

For the first question, "To my surprise, chain - recommended over sum for lists by everyone in several comments on my answers - is much slower", there are two reasons for your observed timings:

  • For small inputs, the timings are dominated by function call overhead. Calling both list and chain.from_iterable is more expensive than calling just sum. The actual work of concatenating the small inputs is faster than the work for making the function and method calls.

  • For larger inputs, the expected quadratic behavior of the a = a + b logic will dominate.

For your other question, "why is it possible on lists where it is prevented on strings", the answer is that we can't detect and report all quadratic cases, so we just report on the one a user is most likely to stumble on accidentally.

Also, the work-around of ''.join(list_of_strings) is harder to figure-out if you don't already know about it. In contrast, performant work-arounds for lists are much easier to find, t=[]; for s in list_of_lists: t+=s.

Using the non-itertools alternative, you should be able to get reasonable performance with simple in-place list extensions:

result = []
for seq in list_of_lists:
    result += seq

The loop runs at "python-speed" instead of "C-speed", but there is no function call overhead, there is no extra iteration layer, and more importantly, the list concatenation can take advantage of the known length of the input so it can pre-allocate the space needed for the result (this is called a __length_hint__).

One other thought, you should never trust timings that involve growing lists incrementally. The internal logic uses realloc() to resize the list as it grows. In timing suites, the environment is favorable and realloc can often extend in-place because no other data is in the way. However, the same logic used in real code can perform much worse because the more fragmented memory causes realloc to have to copy all the data to a larger empty space. In other words, the timings may not be at all indicative of the actual performance in real code that you care about.

In any case, the main reason that sum() is the way it is is because Guido van Rossum and Alex Martelli thought that is what is best for the language:

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485