2

I have several lists, stored in a list, itself created in a loop of unknown number of iterations, and I need to concatenate them all. Example:

lists = [range(i) for i in range(1,5)]
lists
Out[1]: [[0], [0, 1], [0, 1, 2], [0, 1, 2, 3]]

So, now I want to turn them into a single, flat list. I can do this by just adding them:

biglist = lists[0] + lists[1] + lists[2] + lists[3]

...but that gets boring very quickly. I could write a for loop which iterates over the inner lists:

biglist = []
for smallist in lists:
    biglist += smallist
biglist
Out[2]: [0, 0, 1, 0, 1, 2, 0, 1, 2, 3] 

This works but requires three lines of code and handling intermediate results, so it cannot work inline and gets in the way of preferring functional code.

But since all I need is to add some lists to each other, and there's already a builtin function for that in Python, it stands to reason I could just use sum(lists) -- however:

sum(lists)
Traceback (most recent call last):
  File "D:\program_files\Anaconda\envs\SPINE_dev\lib\site-packages\IPython\core\interactiveshell.py", line 2878, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-12-827ffc5ab7d2>", line 1, in <module>
    sum(lists)
TypeError: unsupported operand type(s) for +: 'int' and 'list'

What's the issue? Should this not work? I went looking for an answer and only found this trick, which works, but without explanation:

sum(lists, [])
Out[3]: [0, 0, 1, 0, 1, 2, 0, 1, 2, 3]

Note that the original hint was to use list(sum(lists, [])), but it seems to work just fine without using list(), which looks much better, too.

So, the question: Why do I need to supply an empty list? Secondary question: Why would someone recommend wrapping that statement in a type conversion, and are there scenarios (or Python versions) where that would be necessary?

I'm using Python 2.7.10

Zak
  • 3,063
  • 3
  • 23
  • 30
  • I'd suggest reading [How to make a flat list out of list of lists?](https://stackoverflow.com/q/952914/364696), which has many better solutions than `sum` ([`itertools.chain` being the most straightforward, though `functools.reduce`+`operator.iconcat` is slightly faster; both are asymptotically equivalent though, unlike `sum`](https://stackoverflow.com/a/45323085/364696)). – ShadowRanger Dec 31 '19 at 01:58

2 Answers2

3

If you check sum()'s documentation, you will see that, when you do not pass your empty list, the default value of that parameter (called start) is zero:

 sum(iterable, /, start=0)

Then, sum() will get each value from the list you gave it and try to add up to its start parameter at first. It means it will try to add up zero to the first element of your list, which is also a list. And what happens when you try to add a list to a number? A TypeError:

>>> 0 + [1, 2, 3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'list'

When you pass the empty list, then the first thing sum() does is to add up that empty list to your first list. In this case, there should have no error (although, as expected, the result will not be different from the first element):

>>> [] + [1, 2, 3]
[1, 2, 3]

(That said, I would point out that in real world problems it is probably better to use itertools.chain(), which is more efficient than creating entire new lists as sum() does. It should not be a problem to your example in the question, though.)

brandizzi
  • 26,083
  • 8
  • 103
  • 158
  • Well, that's interesting ... the Python 2.7 documentation: https://docs.python.org/2.7/library/functions.html#sum does not mention that default, and neither does the one for 3.5: https://docs.python.org/3.5/library/functions.html#sum – Zak Dec 31 '19 at 01:43
  • 1
    @Zak, it's literally right there in the first para (for both Py2 & 3): Sums start and the items of an iterable from left to right and returns the total. START DEFAULTS TO 0. :-) – paxdiablo Dec 31 '19 at 01:48
  • In my case I have pretty small lists, so it won't matter whether I create new ones or not. Actually, it might be safer in my case since that means I can change them later without worrying about the new values proliferating where I don't expect them. Also: doesn't itertools.chain() return an iterator, not a list? I have to use the result like a proper list (adding, appending, eventually converting to numpy array, for actual work). – Zak Dec 31 '19 at 01:48
  • @Zak: Both of them mention it in the text; they were using the old, inconsistent way of describing optional arguments, where the optional argument is in brackets, rather than being assigned a default like in plain Python. The newer docs have been trying to normalize function signatures. – ShadowRanger Dec 31 '19 at 01:49
  • @paxdiablo -- oh dangit, you're right, did not read that far ... thanks! – Zak Dec 31 '19 at 01:49
  • 2
    @Zak: You can convert the result of `chain` to a `list` to get a real `list`, e.g. `list(chain.from_iterable(lists))` or `list(chain(list1, list2, list3, list4, list5))`. The difference is that the `chain` solution is `O(n)` work (in total number of items), while `sum` is `O(m * n)` work, where `m` is the number of `list`s. – ShadowRanger Dec 31 '19 at 01:51
  • 2
    @Zak: You should probably read up on [Schlemiel the Painter's algorithms](https://en.wikipedia.org/wiki/Schlemiel_the_Painter's_algorithm); using `sum` to concatenate many `list`s is that sort of algorithm. – ShadowRanger Dec 31 '19 at 01:55
  • @ShadowRanger:My lists are pretty small, so I won't start optimizing for speed. I know that generator objects are faster but believed the conversion to ``list`` would negate the advantage -- thanks for correcting that misconception. I had though that ``sum()`` was smarter than that. – Zak Dec 31 '19 at 02:01
1

To answer your secondary question: frequently in python operations on lists (or other iterables) return a generator object instead of a list. To get a list you then have to cast the generator back to a list.

To not answer you primary question: you may want to try itertools.chain(*lists) instead. This will concatenation all your lists into one flat list.

Jaime M
  • 161
  • 6
  • 1
    In Python 3.x, `range()` returns a `range` object, not a list. That's the reason for the recommendation. – Barmar Dec 31 '19 at 01:49
  • So, using ``list()`` could be useful when using ``itertools.chain()`` (if I don't just want to iterate over the list but use it in other ways) -- but would not appear necessary when using ``sum()``? @Barmar: Thanks, makes sense. Though it also confirms that I can keep omitting it, at least for now :) – Zak Dec 31 '19 at 01:54
  • @Barmar: Note: OP is using Python 2 for some reason, so `range` is still a function returning a `list`. Not sure how that has anything to do with this answer though. – ShadowRanger Dec 31 '19 at 02:00
  • @ShadowRanger That's my point. He doesn't need to convert it now, but the recommendations he asked about are for Python 3. – Barmar Dec 31 '19 at 02:06
  • @Barmar Python 3's `range` objects can't be added to `[]`, so there the recommendation doesn't make sense, either, as already the `sum` crashes. – Stefan Pochmann Dec 31 '19 at 02:13
  • @StefanPochmann That's why you need to convert them to lists before using them in `sum()`. – Barmar Dec 31 '19 at 02:25
  • @Barmar That's not what that recommendation is, though. – Stefan Pochmann Dec 31 '19 at 02:28
  • @StefanPochmann Right, I was thinking of `sum(list(x) for x in ranges, [])` – Barmar Dec 31 '19 at 02:34