25

I read in a comment here on Stack Overflow that it is more memory efficient to do slice assignment when changing lists. For example,

a[:] = [i + 6 for i in a]

should be more memory efficient than

a = [i + 6 for i in a]

because the former replaces elements in the existing list, while the latter creates a new list and rebinds a to that new list, leaving the old a in memory until it can be garbage collected. Benchmarking the two for speed, the latter is slightly quicker:

$ python -mtimeit -s 'a = [1, 2, 3]' 'a[:] = [i + 6 for i in a]'
1000000 loops, best of 3: 1.53 usec per loop
$ python -mtimeit -s 'a = [1, 2, 3]' 'a = [i + 6 for i in a]'
1000000 loops, best of 3: 1.37 usec per loop

That is what I'd expect, as rebinding a variable should be faster than replacing elements in a list. However, I can't find any official documentation which supports the memory usage claim, and I'm not sure how to benchmark that.

On the face of it, the memory usage claim makes sense to me. However, giving it some more thought, I would expect that in the former method, the interpreter would create a new list from the list comprehension and then copy the values from that list to a, leaving the anonymous list in floating around until it is garbage collected. If that's the case, then the former method would use the same amount of memory while also being slower.

Can anyone show definitively (with a benchmark or official documentation) which of the two methods is more memory efficient/which is the preferred method?

Thanks in advance.

Mitch Lindgren
  • 2,120
  • 1
  • 18
  • 36
  • 2
    The performance aspects might be worth considering, but I think you are more likely to run into the practical case (in larger programs) where you pass a reference to a list, say, from Class1 to Class2. In the first instance, using slice assignment to modify Class1's list will preserve Class2's reference. In the second instance you cite, modifying Class1's list means that Class2 will be holding a reference to a list that is no longer valid. – Brandon Feb 09 '11 at 17:59
  • @Brandon: That is also true, and I probably should have mentioned the distinction in my question. Thanks for your input. – Mitch Lindgren Feb 09 '11 at 18:03

1 Answers1

49

The line

a[:] = [i + 6 for i in a]

would not save any memory. Python does evaluate the right hand side first, as stated in the language documentation:

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.

In the case at hand, the single resulting object would be a new list, and the single target in the target list would be a[:].

We could replace the list comprehension by a generator expression:

a[:] = (i + 6 for i in a)

Now, the right hand side evaluates to a generator instead of a list. Benchmarking shows that this is still slower than the naive

a = [i + 6 for i in a]

So does the generator expression actually save any memory? At first glance, you might think it does. But delving in to the source code of the function list_ass_slice() shows that it does not. The line

v_as_SF = PySequence_Fast(v, "can only assign an iterable");

uses PySequence_Fast() to convert the iterable (in this case the generator) into a tuple first, which is then copied into the old list. A tuple uses the same amount of memory as a list, so using a generator expression is basically the same as using a list comprehension in this case. During the last copy, the items of the original list are reused.

The moral seems to be that in this case the simplest approach is best in every regard.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 6
    +1 for merciless crushing of the premature (memory) optimizers. –  Feb 09 '11 at 17:59
  • 3
    Thanks for the detailed and insightful answer! In response to the commenter above, I'd like to add that this might not have been premature optimization if you were dealing with a list of 5 million elements and had a choice between copying it and not copying it. :) – Mitch Lindgren Feb 09 '11 at 17:59
  • 2
    @Mitch: if you have 5 million entries, you are probably better off with e.g. a NumPy array than a Python list. – Sven Marnach Feb 09 '11 at 18:06
  • Looking in the mechanisms under the hood makes this answer a very good and definitive one. +1 – eyquem Nov 07 '11 at 15:55
  • 1
    I think it is somewhat reassuring that `a[:] = (x + 6 for x in a)` doesn't save any memory, and that the generator is consumed before any item is assigned to `a[:]`. Otherwise, you'd be iterating on `a` while modifying `a` at the same time, and it could easily get confusing. Consider `a[::-1] = (x + 6 for x in a)`. At a glance it looks like it should reverse `a` and add 6 to every item. And indeed, it does. But if the generator was consumed lazily during the assignment, then it'd be equivalent to `for i,x in enumerate(a): a[~i] = x + 6`, whose exact behaviour is hard to understand at a glance. – Stef Mar 05 '23 at 09:37