0

I just saw someone wrote this below and got confused why sum() could be used to remove the bracket from another list:

pwd = [['x'], ['y'], ['z']]

a = sum(pwd, [])
print(a)          // ['x', 'y', 'z']

By looking up sum() definition

sum(iterable, /, start=0)`

iterable can be anything, list, tuples or dictionaries, but most importantly it should be numeric.

start is added to the sum of numbers in the iterable. If start is not given in the syntax, it is assumed to be 0.

How does an empty list as start argument of sum() remove the list from another list? This puzzles me…could anyone demystify this?

martineau
  • 119,623
  • 25
  • 170
  • 301
Jason T.
  • 113
  • 1
  • 2
  • 7
  • This is very inefficient, though. It has to create the new list `['x', 'y']`, then the new list `['x', 'y', 'z']`. The more lists you are adding, the more copying you are doing from one temporary list to the next. A better solution is `a = list(itertools.chain.from_iterable(pwd))`, which builds the final list all at once in linear, rather than quadratic, time. – chepner Feb 08 '22 at 23:11
  • Removing those brackets from the nested lists is known as "flattening" the outermost list. There are better ways than using `sum()` to do it that don't require numeric values — see [How to make a flat list out of a list of lists?](https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-a-list-of-lists) – martineau Feb 08 '22 at 23:19
  • **Do not use this algorithm to flatten a nested list**. It is *highly inefficienct*, and it's trivially accomplished efficiently. – juanpa.arrivillaga Feb 08 '22 at 23:38

4 Answers4

4

Think about what sum does. This:

x = sum([1,2,3,4],0)

Is the same as

x = 0 + 1 + 2 + 3 + 4

Similarly,

x = sum([['x'],['y'],['z']], [])

Is the same as

x = [] + ['x'] + ['y'] + ['z']

And that results in x = ['x','y','z']. It's a side effect of the fact that the list type overrides the + operator.

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • Thanks for the explanation! I never knew sum() has this hidden magic function. But I guess it only works on 2D list only? With 1D list like this ```sum(['a', 'b'], ['c'])```, it gives error. – Jason T. Feb 08 '22 at 22:54
  • And why does ```sum([['x'], ['y'], ['z']])``` not work either?.... – Jason T. Feb 08 '22 at 23:04
  • Yes, because the elements of your list are strings. If you did `sum(['a','b'], 'c')`, you'd find the string result `"cab"`. – Tim Roberts Feb 08 '22 at 23:04
  • `sum([['x'],['y'],['z']])` doesn't work because the default starting value is the integer 0. You can't add a list to an integer. – Tim Roberts Feb 08 '22 at 23:05
  • ```sum(['a','b'], 'c')``` does not result ```"cab"``` but error: ```TypeError: sum() can't sum strings [use ''.join(seq) instead]``` – Jason T. Feb 08 '22 at 23:07
  • Thanks! I got it for the case ```sum([['x'],['y'],['z']])```. – Jason T. Feb 08 '22 at 23:08
  • @JasonT.-- You're right, my bad. It WOULD have worked that way, but they do a special case check to encourage the smarter path. – Tim Roberts Feb 08 '22 at 23:19
  • @JasonT. this is *not a magic hidden function*. Absolutely **do not use** `sum` to flatten nested lists, or to concatenate various strings. Indeed, the `sum` function will throw an error if you try to do that. Use `sum` only to sum numeric objects – juanpa.arrivillaga Feb 08 '22 at 23:40
  • @TimRoberts no, you'll find `sum(['a','b'], 'c')` raises a `TypeError` because you are trying to join a sequence of strings, so the function raises that error to prevent you shooting yourself in the foot, which they probably should do for all the built-in sequence types. – juanpa.arrivillaga Feb 08 '22 at 23:40
3

Python doesn't know what addition means. It relies on object methods to do the work. + is really a call to an object's __add__ method. Integers add, but lists extend - at least when adding another list.

sum adds iterated values to the start object. When you make start a list, it sums using the list addition rules. In your case, you start with an empty list, and then each iterated value, also a list, is added - extending the list. Its the same as

>>> a = []
>>> pwd = [['x'], ['y'], ['z']]
>>> for val in pwd:
...     print(val)
...     a = a + val
... 
['x']
['y']
['z']
>>> a
['x', 'y', 'z']

This is part of the dynamic nature of python and is leveraged in many ways in various packages. numpy and pandas broadcast operations across entire matricies, for example. pathlib overrides division to join paths.

One could argue that any class you implement should prefer overriding the existing "magic methods" that implement python operators over their own methods. Why would a queue have a put when it can implement +=? Okay, there are reasons why that would be a bad choice, too! That's design work.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
2

We start with the empty list.

After processing the first element, we have [] + ['x'] == ['x'].

After processing the second element, we have ['x'] + ['y'] == ['x', 'y'].

After processing the third element, we have ['x', 'y'] + ['z'] == ['x', 'y', 'z'], as observed.

BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
1

Adding lists just concatenates them so:

sum(pwd,[]) = [] + ['x'] + ['y'] + ['z']
            = ['x', 'y', 'z']

We need the empty list because sum(x) is the same as sum(x,0)

and sum(pwd,0)0 + ['x'] + ['y'] + ['z']

which gives an error as an int cannot be added to a list.

martineau
  • 119,623
  • 25
  • 170
  • 301
William
  • 324
  • 1
  • 8