Consider the following script:
l = [i for i in range(int(1e8))]
l = []
import gc
gc.collect()
# 0
gc.get_referrers(l)
# [{'__builtins__': <module '__builtin__' (built-in)>, 'l': [], '__package__': None, 'i': 99999999, 'gc': <module 'gc' (built-in)>, '__name__': '__main__', '__doc__': None}]
del l
gc.collect()
# 0
The point is, after all these steps the memory usage of this python process is around 30 % on my machine (Python 2.6.5, any more details on request?). Here's an excerpt of the output of top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5478 moooeeeep 20 0 2397m 2.3g 3428 S 0 29.8 0:09.15 ipython
resp. ps aux
:
moooeeeep 5478 1.0 29.7 2454720 2413516 pts/2 S+ 12:39 0:09 /usr/bin/python /usr/bin/ipython gctest.py
According to the docs for gc.collect
:
Not all items in some free lists may be freed due to the particular implementation, in particular
int
andfloat
.
Does this mean, if I (temporarily) need a large number of different int
or float
numbers, I need to export this to C/C++ because the Python GC fails to release the memory?
Update
Probably the interpreter is to blame, as this article suggests:
It’s that you’ve created 5 million integers simultaneously alive, and each int object consumes 12 bytes. “For speed”, Python maintains an internal free list for integer objects. Unfortunately, that free list is both immortal and unbounded in size. floats also use an immortal & unbounded free list.
The problem however remains, as I cannot avoid this amount of data (timestamp/value pairs from an external source). Am I really forced to drop Python and go back to C/C++ ?
Update 2
Probably it's indeed the case, that the Python implementation causes the problem. Found this answer conclusively explaining the issue and a possible workaround.