2

I've seen a few related questions posted like two years ago, but I would like to know if any solution has come up recently.

I have a huge dictionary of dictionaries. There are about 4 dictionaries (each of 500 MB size) in my memory. As I keep running the program, I need to delete one of those 4 dictionaries and free up the memory to the OS. So, it is not possible for me to begin a new subprocess for memory allocation as was mentioned in some of the previous posts.

Here's some code to illustrate the problem:

import cPickle
import resource
import gc
import time

mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print "memory usage:", mem
test_dict = {}
for i in range(100000):
    test_dict[i] = "AAAAAAAA"
    if i%10000 == 0:
        mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
        print "memory usage:", mem

mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print "memory usage: (dict created): ", mem
del test_dict
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print "mem usage: (dict deleted)", mem
gc.collect()
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print "mem usage (garbage collection)", mem
print "sleeping for a few seconds"
time.sleep(30)
gc.collect()
mem = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
print "memory usage after sleeping ", mem

Here's the result. The memory is reported in KB.

memory usage: 5152
memory usage: 8316
memory usage: 9176
memory usage: 9176
memory usage: 12076
memory usage: 12076
memory usage: 12076
memory usage: 12076
memory usage: 12076
memory usage: 12076
memory usage: 17548
memory usage: (dict created):  17548
mem usage: (dict deleted) 17548
mem usage (garbage collection) 17548
sleeping for a few seconds
memory usage after sleeping  17548

As you can see, the memory does not seem to freeing up at all. I tried this on my Ubuntu 11.10 machine with Python 2.7.2

Phani
  • 3,267
  • 4
  • 25
  • 50

2 Answers2

5

According to man getrusage:

ru_maxrss (since Linux 2.6.32)
    This is the maximum resident set size used (in kilobytes).

If I understand it correctly it means peak usage rather then current usage.

EDIT:

Also it is worth to look at Memory Management article from Python's official docs.

Ihor Kaharlichenko
  • 5,944
  • 1
  • 26
  • 32
  • Thanks! Do you know of any way to get Python's memory usage from with in the script itself instead of using other programs like top or ps – Phani Aug 17 '12 at 14:14
  • That's easy to test, just add that function to your code and print its output after each of your older prints. According to my tests `memory_usage()['rss']` looks like a reasonable value and it **does* decrease after dictionary deletion. – Ihor Kaharlichenko Aug 17 '12 at 15:03
4

As Ihor Kaharlichenko points out, ru_maxrss is the peak usage of the program. Consider the following program which is very similar to yours:

import time
time.sleep(10)
string = ' ' * int(5e8) # 500 MB string
time.sleep(10)
string = None # the huge string is automatically GC'd here
time.sleep(10)

If you watch the memory usage of this in top or whatever, you'll see it is very small for the first 10 seconds, then spikes to ~500 MB for a little while, and then drops again. Your program exhibits the same behaviour.

huon
  • 94,605
  • 21
  • 231
  • 225
  • +1 Thank you. So, I think I am dealing with a memory leak in my actual program. – Phani Aug 17 '12 at 14:16
  • @Phani, no, you aren't. It is extremely unlikely that you will have a memory leak in Python program that uses built-in modules only. – huon Aug 17 '12 at 14:19
  • I meant some still referenced objects which are not being GC'd. Also, I find a contradictory [answer](http://stackoverflow.com/a/7669279/1224076) regarding what ru_maxrss refers to. Can you please check this out. This is what I based my script on but it doesn't seem to match with top reports now. – Phani Aug 17 '12 at 14:26
  • 2
    @dbaupp - yes, just because Python does garbage collection, don't assume that it is immune to leakage of memory. It's just not the old C-style leakage of "I malloc'ed but didn't free". More like, "I kept around an extra reference to an object, so the GC won't free it." Can happen invisibly in memoizers or module-level containers (lists, dicts, sets, etc.), especially in imported modules where you might not realize that an extra reference is being kept. – PaulMcG Aug 17 '12 at 14:37