6

I'm using python 3.6. I came across the below way to flatten the nested list using sum:

a = [[1, 2], [3, 4], [5, 6]]

sum(a,[])

which returns:

[1,2,3,4,5,6]

What exactly is going on here? Sum takes an iterable, in this case a list, and a start value. I don't understand what python reads to flatten the list.

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
RhythmInk
  • 513
  • 1
  • 4
  • 18
  • 1
    *Don't do that*, and also where the heck did the 6 go? – user2357112 Apr 17 '18 at 21:12
  • Accidentally missed it. Edit has been made. – RhythmInk Apr 17 '18 at 21:13
  • 1
    **Better:** `[x for sublist in a for x in sublist]`. **Why better:** duck-types for tuples. Doesn't create a bunch of useless intermediate lists which just get deleted quickly after. – wim Apr 17 '18 at 21:13
  • Why is that better? Usually I used in-list loop, but this seems much better... – Chris Farr Apr 17 '18 at 21:14
  • 1
    Not *quite* a dup, but [this](https://stackoverflow.com/a/952946/1040092) answer discussed this. – Wondercricket Apr 17 '18 at 21:15
  • @wim or better still `list(itertools.chain.from_iterable(a))`. – Paul Panzer Apr 17 '18 at 21:27
  • So in order to create a six item list this creates an empty list, then a brand new two item list, then a brand new four item list, then finally a six item list. All previous lists are thrown away. At each step all the references are copied. The performance is horrible. You might not notice it with list this small, but indescriminate use of algorithms like this kill performance even if you're using the fastest of languages. – Steven Rumbalski Apr 17 '18 at 21:51

4 Answers4

9

This is just a result of how Python interprets addition of lists. From the docs

sum(iterable[, start])

Sums start and the items of an iterable from left to right and returns the total.

Since sum starts by adding the first element of the iterable to the start argument, you have:

[] + [1, 2] = [1, 2]

Then it continues adding items from the iterable:

[1, 2] + [3, 4] = [1, 2, 3, 4]
[1, 2, 3, 4] + [5, 6] = [1, 2, 3, 4, 5, 6]
Community
  • 1
  • 1
bphi
  • 3,115
  • 3
  • 23
  • 36
8

sum([a, b, c], d) produces d + a + b + c.

In your example, a, b, c, and d are [1, 2], [3, 4], [5, 6], and [].

sum([[1, 2], [3, 4], [5, 6]], []) produces [] + [1, 2] + [3, 4] + [5, 6], which is [1, 2, 3, 4, 5, 6] because + is concatenation for lists.

This is absurdly inefficient, because every + operation involved requires copying all the data from each of its arguments:

In [7]: x = [[i] for i in range(30000)]

In [8]: %timeit sum(x, [])
1 loop, best of 3: 2.06 s per loop

In [9]: %timeit [elem for sublist in x for elem in sublist]
1000 loops, best of 3: 1.91 ms per loop

sum(x, []) takes quadratic time, whereas a more efficient implementation takes linear time. Never do sum(x, []).

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • 1
    Joel Spolsky calls things like `sum(x, [])` [Shlemiel the painter’s algorithm](https://www.joelonsoftware.com/2001/12/11/back-to-basics/). ;) – PM 2Ring Oct 05 '18 at 11:34
6

As the sum(iterable[, start]) document says:

Sums start and the items of an iterable from left to right and returns the total. start defaults to 0. The iterable’s items are normally numbers, and the start value is not allowed to be a string.

So, in the example you shared:

sum(a,[])

Here, iterable is a (which is [[1, 2], [3, 4], [5, 6]]) and start is []. Hence, the resultant is equivalent to:

[] + [1, 2] + [3, 4] + [5, 6]

# i.e. you flatten list --> [1, 2, 3, 4, 5, 6] 
Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
2

The start argument gives the function the starting point. It's what is being added to. So sum([1,2,3]) returns 6 and sum([1,2,3],5) returns 11. In your case, because you're passing in an 2-d list and an empty list, the function is going to add everything in the first argument to the second argument. Essentially, you're doing this:

[]+[1,2]+[3,4]+[5,6]

It's a bit of a quirk of python's operator overloading.

mypetlion
  • 2,415
  • 5
  • 18
  • 22