I have a "memory leak" issue with pandas
Dataframe
. Apparently this is a know issue : Memory leak using pandas dataframe
The tricks used in the answer (use the gc.collect
to manually collect garbage and free memory), works but is quite slow.
My problem is that I need to run this loop at 500Hz:
- without garbage collector : memory leak, but 0.3-0.4ms/loop
- with gc.collect() in the loop : 11ms/loop !!!
(tested on 1000 loops, with time.time()
: may not be exact, but gives a good idea of the problem)
My question is : what are the other alternatives to the gc.collect
, wich works fine but is too slow. I can't call it once in a 1000 cycles because this particular cycle will be extremely slow and I need a reliable frequency.
The code I use for testing is the following :
import pandas as pd
import os
import gc
from multiprocessing import Process,Pipe
import time
a,b=Pipe()
def sender(a): # this one does not leak
print "sender :", os.getpid()
while True:
Data=pd.DataFrame([[1.,2.,3.]],columns=['a','b','c'])
a.send(Data)
def main(b): ### this one cause a memory "leak" !!!!! only when the pipe is on
try:
print "receiver :", os.getpid()
i=0
#t=time.time() # for timing purpose
while True:
Data=b.recv()
cmd=Data['a'].values[0]
i+=1
#gc.collect() # remove the memory leak, but slooooooow
#if i%1000==0: # loop for timing purpose
#t1=time.time()
#print i
#print (t1-t)/1000
#t=t1
except (Exception,KeyboardInterrupt) as e:
print "Exception : ", e
raise
try:
p=Process(target=main,args=(b,))
q=Process(target=sender,args=(a,))
p.start()
q.start()
except (Exception,KeyboardInterrupt) as e:
print "Exception in main : ", e
p.terminate()
q.terminate()