4

I have a "memory leak" issue with pandas Dataframe. Apparently this is a know issue : Memory leak using pandas dataframe

The tricks used in the answer (use the gc.collect to manually collect garbage and free memory), works but is quite slow.

My problem is that I need to run this loop at 500Hz:

  • without garbage collector : memory leak, but 0.3-0.4ms/loop
  • with gc.collect() in the loop : 11ms/loop !!!

(tested on 1000 loops, with time.time() : may not be exact, but gives a good idea of the problem)

My question is : what are the other alternatives to the gc.collect, wich works fine but is too slow. I can't call it once in a 1000 cycles because this particular cycle will be extremely slow and I need a reliable frequency.

The code I use for testing is the following :

import pandas as pd
import os
import gc
from multiprocessing import Process,Pipe
import time

a,b=Pipe()

def sender(a): # this one does not leak
    print "sender :", os.getpid()
    while True:
        Data=pd.DataFrame([[1.,2.,3.]],columns=['a','b','c'])
        a.send(Data)


def main(b):  ### this one cause a memory "leak" !!!!! only when the pipe is on
    try:
        print "receiver :", os.getpid()
        i=0
        #t=time.time() # for timing purpose
        while True:
            Data=b.recv()
            cmd=Data['a'].values[0]
            i+=1
            #gc.collect() # remove the memory leak, but slooooooow
            #if i%1000==0: # loop for timing purpose
                #t1=time.time()
                #print i
                #print (t1-t)/1000
                #t=t1
    except (Exception,KeyboardInterrupt) as e:
        print "Exception : ", e
        raise

try:
    p=Process(target=main,args=(b,))
    q=Process(target=sender,args=(a,))

    p.start()
    q.start()

except (Exception,KeyboardInterrupt) as e:
    print "Exception in main : ", e
    p.terminate()
    q.terminate()
Community
  • 1
  • 1
CoMartel
  • 3,521
  • 4
  • 25
  • 48

0 Answers0