8

... and every for-loop looked like a list comprehension.

Instead of:

for stuff in all_stuff:
    do(stuff)

I was doing (not assigning the list to anything):

[ do(stuff) for stuff in all_stuff ]

This is a common pattern found on list-comp how-to's. 1) OK, so no big deal right? Wrong. 2) Can't this just be code style? Super wrong.

1) Yea that was wrong. As NiklasB points out, the of the HowTos is to build up a new list.

2) Maybe, but its not obvious and explicit, so better not to use it.

I didn't keep in mind that those how-to's were largely command-line based. After my team yelled at me wondering why the hell I was building up massive lists and then letting them go, it occurred to me that I might be introducing a major memory-related bug.

So here'er my question/s. If I were to do this in a very long running process, where lots of data was being consumed, would this "list" just continue consuming my memory until let go? When will the garbage collector claim the memory back? After the scope this list is built in is lost?

My guess is yes, it will keep consuming my memory. I don't know how the python garbage collector works, but I would venture to say that this list will exist until after the last next is called on all_stuff.

EDIT.

The essence of my question is relayed much cleaner in this question (thanks for the link Niklas)

Community
  • 1
  • 1
sbartell
  • 883
  • 1
  • 7
  • 18

5 Answers5

6

If I were to do this in a very long running process, where lots of data was being consumed, would this "list" just continue consuming my memory until let go?

Absolutely.

When will the garbage collector claim the memory back? After the scope this list is built in is lost?

CPython uses reference counting, so that is the most likely case. Other implementations work differently, so don't count on it.

Thanks to Karl for pointing out that due to the complex memory management mechanisms used by CPython this does not mean that the memory is immediately returned to the OS after that.

I don't know how the python garbage collector works, but I would venture to say that this list will exist until after the last next is called on all_stuff.

I don't think any garbage collector works like that. Usually they mark-and-sweep, so it could be quite some time before the list is garbage collected.

This is a common pattern found on list-comp how-to's.

Absolutely not. The point is that you iterate the list with the purpose of doing something with every item (do is called for it's side-effects). In all the examples of the List-comp HOWTO, the list is iterated to build up a new list based on the items of the old one. Let's look at an example:

# list comp, creates the list [0,1,2,3,4,5,6,7,8,9]
[i for i in range(10)]

# loop, does nothing
for i in range(10):
    i  # meh, just an expression which doesn't have an effect

Maybe you'll agree that this loop is utterly senseless, as it doesn't do anything, in contrary to the comprehension, which builds a list. In your example, it's the other way round: The comprehension is completely senseless, because you don't need the list! You can find more information about the issue on a related question

By the way, if you really want to write that loop in one line, use a generator consumer like deque.extend. This will be slightly slower than a raw for loop in this simple example, though:

>>> from collections import deque
>>> consume = deque(maxlen=0).extend
>>> consume(do(stuff) for stuff in all_stuff)
Community
  • 1
  • 1
Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • Can you do some `timeit` benchmarks for your last code block? – Blender Mar 17 '12 at 00:26
  • @Blender: Meh, can't seem to be able to prove this... Thanks for forcing me to learn it the hard way :P – Niklas B. Mar 17 '12 at 00:33
  • There was a question some time ago about that: [Passing iterators to any for execution for speed and Why?](http://stackoverflow.com/q/9144934/1132524) – Rik Poggi Mar 17 '12 at 00:34
  • CPython will garbage-collect the object promptly if it is unused (probably not until the end of the function, because of the naive scoping rules), but that's no guarantee that the memory will be returned to the OS any time soon. CPython has several layers of memory management (pooling allocators and so forth) going on behind the scenes. – Karl Knechtel Mar 17 '12 at 00:34
  • extend still need memory, 'any' is better – PasteBT Mar 17 '12 at 00:39
  • @PasteBT it should read `deque(maxlen=0)`; the deque allows you to force it not to actually add any elements when you "extend" it. – Karl Knechtel Mar 17 '12 at 00:42
  • @KarlKnechtel: Thanks again :/ Knew I was missing something. – Niklas B. Mar 17 '12 at 00:43
  • @KarlKnechtel OK, I missed that, but after simple testing, deque(maxlen=0) a little faster then 'any', but still slower then simple for loop. – PasteBT Mar 17 '12 at 05:24
3

Try manually doing GC and dumping the statistics.

gc.DEBUG_STATS

Print statistics during collection. This information can be useful when tuning the collection frequency.

FROM

http://docs.python.org/library/gc.html

FlavorScape
  • 13,301
  • 12
  • 75
  • 117
2

The CPython GC will reap it once there are no references to it outside of a cycle. Jython and IronPython follow the rules of the underlying GCs.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
2

If you like that idiom, do returns something that always evaluates to either True or False and would consider a similar alternative with no ugly side effects, you can use a generator expression combined with either any or all.

For functions that return False values (or don't return):

any(do(stuff) for stuff in all_stuff)

For functions that return True values:

all(do(stuff) for stuff in all_stuff)
Eduardo Ivanec
  • 11,668
  • 2
  • 39
  • 42
  • 2
    Unless `do` has a meaningful return value that simply isn't being looked at here. `any` only exhausts the iterator until the first True value it yields. – lvc Mar 17 '12 at 00:40
  • Note that there are sometimes performance advantages (at least with CPython) with this method vs. a normal loop. – agf Mar 17 '12 at 01:18
0

I don't know how the python garbage collector works, but I would venture to say that this list will exist until after the last next is called on all_stuff.

Well, of course it will, since you're building a list that will have the same number of elements of all_stuff. The interpreter can't discard the list before it's finished, can it? You could call gc.collect between one of these loops and another one, but each one will be fully constructed before it can be reclaimed.

In some cases you could use a generator expression instead of a list comprehension, so it doesn't have to build a list with all your values:

(do_something(i) for i in xrange(1000))

However you'd still have to "exaust" that generator in some way...

mgibsonbr
  • 21,755
  • 7
  • 70
  • 112
  • That was the problem, all_stuff was a generator yielding network data. It wasn't going to become exhausted any time soon. – sbartell Mar 17 '12 at 00:39
  • I meant you had to ensure the interpreter would iterate over the generator (sorry my english). Using one of the other people suggestions, like `any` or `deque.extend` would consume each element as soon as they are generated, without storing them on a list. – mgibsonbr Mar 17 '12 at 00:45