35

Here is my code:

from memory_profiler import profile

@profile
def mess_with_memory():
    huge_list = range(20000000)
    del huge_list
    print "why this kolaveri di?"

This is what the output is, when I ran it from interpreter:

Line # Mem usage Increment Line Contents

 3      7.0 MiB      0.0 MiB   @profile
 4                             def mess_with_memory():
 5                             
 6    628.5 MiB    621.5 MiB       huge_list = range(20000000)
 7    476.0 MiB   -152.6 MiB       del huge_list
 8    476.0 MiB      0.0 MiB       print "why this kolaveri di"

If you notice the output, creating the huge list consumed 621.5 MB while deleting it just freed up 152.6 MB. When i checked the docs, I found the below statement:

the statement del x removes the binding of x from the namespace referenced by the local scope

So I guess, it didn't delete the object itself, but just unbind it. But, what did it do in unbinding that it freed up so much of space(152.6 MB). Can somebody please take the pain to explain me what is going on here?

Manish Jain
  • 829
  • 1
  • 8
  • 20
  • 2
    `del huge_list` and `huge_list = None` are [roughly] equivalent for the sake of discussing object reachability. – user2864740 Jan 10 '14 at 20:09
  • Do you actually have a problem like your program eventually running out of space and raising a `MemoryError`, or throwing your computer into swap-thrashing hell? If there's no _visible_ problem, there may actually be no problem worth worrying about. – abarnert Jan 10 '14 at 20:09
  • 2
    @abarnert: Yeah, it's just for "improve my understanding of python" purpose. – Manish Jain Jan 10 '14 at 20:13
  • 5
    152.6 MIB is almost exactly 8 bytes per list element. Seems within the realm of reason. I'd be more curious to know what took up the other 469 MiB. – Mark Ransom Jan 10 '14 at 20:22
  • The remainder is 24 bytes per element plus a bit of slop, and 24 bytes happens to be the size of a `PyInt` header in a default build of 64-bit CPython 2.7, so… it's possible that most or all of the `PyInt` memory is sitting around in free lists at one level or another, while the `PyList`'s internal storage buffer (152MiB worth of pointers to those PyInt objects) got reclaimed because it was one giant allocation (possibly even directly allocated in a single `mmap` or `VirtualAlloc` call) instead of a bunch of little ones. – abarnert Jan 10 '14 at 20:38

2 Answers2

53

Python is a garbage-collected language. If a value isn't "reachable" from your code anymore, it will eventually get deleted.

The del statement, as you saw, removes the binding of your variable. Variables aren't values, they're just names for values.

If that variable was the only reference to the value anywhere, the value will eventually get deleted. In CPython in particular, the garbage collector is built on top of reference counting. So, that "eventually" means "immediately".* In other implementations, it's usually "pretty soon".

If there were other references to the same value, however, just removing one of those references (whether by del x, x = None, exiting the scope where x existed, etc.) doesn't clean anything up.**


There's another issue here. I don't know what the memory_profiler module (presumably this one) actually measures, but the description (talking about use of psutil) sounds like it's measuring your memory usage from "outside".

When Python frees up storage, it doesn't always—or even usually—return it to the operating system. It keeps "free lists" around at multiple levels so it can re-use the memory more quickly than if it had to go all the way back to the OS to ask for more. On modern systems, this is rarely a problem—if you need the storage again, it's good that you had it; if you don't, it'll get paged out as soon as someone else needs it and never get paged back in, so there's little harm.

(On top of that, which I referred to as "the OS" above is really an abstraction made up of multiple levels, from the malloc library through the core C library to the kernel/pager, and at least one of those levels usually has its own free lists.)

If you want to trace memory use from the inside perspective… well, that's pretty hard. It gets a lot easier in Python 3.4 thanks to the new tracemalloc module. There are various third-party modules (e.g., heapy/guppy, Pympler, meliae) that try to get the same kind of information with earlier versions, but it's difficult, because getting information from the various allocators, and tying that information to the garbage collector, was very hard before PEP 445.


* In some cases, there are references to the value… but only from other references that are themselves unreachable, possibly in a cycle. That still counts as "unreachable" as far as the garbage collector is concerned, but not as far as reference counts are concerned. So, CPython also has a "cycle detector" that runs every so often and finds cycles of mutually-reachable but not-reachable-from-anyone-else values and cleans them up.

** If you're testing in the interactive console, there may be hidden references to your values that are hard to track, so you might think you've gotten rid of the last reference when you haven't. In a script, it should always be possible, if not easy, to figure things out. The gc module can help, as can the debugger. But of course both of them also give you new ways to add additional hidden references.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 3
    Not reachable? References? You can always try* to restore exiled variables by id! http://stackoverflow.com/a/15012814/194586 (*may or may not cause a segfault) – Nick T Jan 10 '14 at 20:10
  • 2
    @NickT Those variables aren't "dead" .. in any case, the very fact that "may cause a segfault" means that the objects *can* be reclaimed (and were thus *not reachable* via a GC root; that post only shows that objects can be fetched via an *opaque identifier* if they still exist). – user2864740 Jan 10 '14 at 20:12
  • 6
    @user2864740: Exactly. This is documented not to work; the fact that it happens to sometimes work, and sometimes segfault, sometimes cause a segfault half an hour later because you corrupted something, and sometimes silently just give you the wrong values only counts as "working" if you really stretch the definition… – abarnert Jan 10 '14 at 20:15
  • Why can I read about stuff like *It keeps "free lists" around at multiple levels so it can re-use the memory*, is it documented anywhere? – Ashwini Chaudhary Jan 10 '14 at 21:04
  • 3
    @AshwiniChaudhary: In 2.x, other than the [small bit that's relevant to writing C extensions](http://docs.python.org/2/c-api/memory.html), it's only documented in the source code, primarily in [`obmalloc.c`](http://hg.python.org/cpython/file/2.7/Objects/obmalloc.c). – abarnert Jan 10 '14 at 21:09
  • 1
    @AshwiniChaudhary: In 3.0-3.3, there's a bit more, but 3.4 is where they really started cleaning things up. First, Python now exposes the memory allocator to extensions and allows them to hook it, so [the docs](http://docs.python.org/3.4/c-api/memory.html) have a lot more detail. And the [source](http://hg.python.org/cpython/file/0e5df5b62488/Objects/obmalloc.c) is more readable. And [PEP 445](http://www.python.org/dev/peps/pep-0445/) and the various links from it have more info. – abarnert Jan 10 '14 at 21:11
  • 1
    @AshwiniChaudhary: But for custom allocators like the one used by `list`, you still have to search around the code, like [`list object.c](http://hg.python.org/cpython/file/0e5df5b62488/Objects/listobject.c); as far as I know there's no list anywhere of all of them. – abarnert Jan 10 '14 at 21:14
  • @abarnert Thanks for the links and a great answer. I wish I could a +10. :-) – Ashwini Chaudhary Jan 10 '14 at 21:16
  • 2
    @abarnert: Thanks. gc module did help. doing gc.collect() freed up rest of the memory :) – Manish Jain Jan 12 '14 at 16:12
0

In addition to abarnert's excellent answer, I would like to add an example of cycling reference in python:

class RefCycleExample:
    def __init__(self):
        self.myself = self

    def __del__(self):
        print("deleting")

obj = RefCycleExample()
del obj

In the example above, after del obj, obj will be unreachable; however, as itself has a attribute that points to itself (a reference cycle), it would not be garbage-collected immediately. Instead, at some time in the future it would be garbage-collected, or at the time when the iterpreter executes gc.collect().

Terence
  • 1
  • 1