4

I have a python script that runs a loop. Within this loop, the function DoDebugInfo is called, once per loop iteration. This function basically prints some pictures to the hard disk using matplotlib, export a KML file and do some other calculations, and returns nothing.

I'm having the problem that python, for each run, the function DoDebugInfo eats more and more RAM. I guess some variable are increasing it's size on each loop.

I added the following lines before and after the call:

print '=== before: ' + str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
DoDebugInfo(inputs)
print '=== after: ' + str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)

The output is:

=== before: 71598.08
=== after: 170237.952
=== before: 170237.952
=== after: 255696.896
=== before: 255696.896
=== after: 341409.792

As you can see, before the call the program has a memory footprint, and after the call it increases, but stays stable until before the next call.

why is this? since DoDebugInfo(inputs) is a function that returns nothing, how can it be that some variables stay on memory? is there a need to clear all variables at the end of the function?

Edit: the DoDebugInfo imports this functions:

def plot_line(x,y,kind,lab_x,lab_y,filename):
    fig = plt.figure(figsize=(11,6),dpi=300)
    ax = fig.add_subplot(111)
    ax.grid(True,which='both')
    #print 'plotting'
    if type(x[0]) is datetime.datetime:
        #print 'datetime detected'
        ax.plot_date(matplotlib.dates.date2num(x),y,kind)
        ax.fmt_xdata = DateFormatter('%H')
        ax.autoscale_view()
        fig.autofmt_xdate()
    else:   
        #print 'no datetime'
        ax.plot(x,y,kind)
    xlabel = ax.set_xlabel(lab_x)
    ax.set_ylabel(lab_y)
    fig.savefig(filename,bbox_extra_artists=[xlabel], bbox_inches='tight')

def plot_hist(x,Nbins,lab_x,lab_y,filename):
    fig = plt.figure(figsize=(11,6),dpi=300)
    ax = fig.add_subplot(111)
    ax.grid(True,which='both')
    ax.hist(x,Nbins)
    xlabel = ax.set_xlabel(lab_x)
    ax.set_ylabel(lab_y)
    fig.savefig(filename,bbox_extra_artists=[xlabel], bbox_inches='tight')

and plots 10 figures to the disk using something like:

plot_line(index,alt,'-','Drive Index','Altitude in m',output_dir + 'name.png')

if I comment the lines that use plot_line the problem does not happen, so the leak should be on this lines of code.

Thanks

msw
  • 42,753
  • 9
  • 87
  • 112
otmezger
  • 10,410
  • 21
  • 64
  • 90
  • 1
    Show us your `DoDebugInfo` function. – eumiro Apr 18 '13 at 10:53
  • 2
    A function that returns nothing can still alter globals, or use a mutable parameter that is not cleaned up between calls. – Martijn Pieters Apr 18 '13 at 10:58
  • @eumiro I have narrowed the leak, please take a look at the function I'm using inside `DoDebugInfo`. here is the leak somewhere. Thanks – otmezger Apr 18 '13 at 11:05
  • @MartijnPieters the function does not alter globals... and I don't know what is a mutable parameter, but I'll check it. thanks – otmezger Apr 18 '13 at 11:06
  • Martijn is refering to the [mutable default argument gotcha](http://stackoverflow.com/questions/1132941/least-astonishment-in-python-the-mutable-default-argument). – Lauritz V. Thaulow Apr 18 '13 at 11:06

2 Answers2

6

The problem relies on so many figures being created and never closed. Somehow python keeps them all alive.

I added the line

plt.close()

to each of my plot functions plot_line and plot_hist and the problem is gone.

otmezger
  • 10,410
  • 21
  • 64
  • 90
0

Does the size grow without bound? Very few programs (or libraries) return heap to the system that they allocate even when no longer used, and CPython (2.7.3) is no exception. The usual culprit is malloc which will increase process memory on demand, will return space to its free-list upon free, but never de-allocates that which it has requested from the system. This sample code intentionally grabs memory and shows that the process use is bounded and finite:

import resource

def maxrss(start, end, step=1):
    """allocate ever larger strings and show the process rss"""
    for exp in range(start, end, step):
        s = '0' * (2 ** exp)
        print '%5i: %sk' % (exp, 
            resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1000)
    # s goes out of scope here and is freed by Python but not returned 
    # to the system

try:
    maxrss(1, 40, 5)
except MemoryError:
    print 'MemoryError'

maxrss(1, 30, 5)

Where the output (on my machine) is, in part:

26: 72k
31: 2167k
MemoryError
 1: 2167k
 6: 2167k
 ...
26: 2170k

Which shows that the interpreter failed to get 2**36 bytes of heap from the system, but still had the memory "on hand" to fill later requests. As the last line of the script demonstrates the memory is there for Python to use, even if it is not currently using it.

msw
  • 42,753
  • 9
  • 87
  • 112
  • 1
    it actually keeps growing in memory until my mac get's difficult to operate and the UI is almost frozen. I killed the python app and it released 4 GB of RAM. I don't know what would happen if I let him go further... – otmezger Apr 18 '13 at 12:50