I face the problem of memory leaks using pandas
library in python. I create pandas.dataframe
objects in my class and I have method, that change dataframe size according my conditions. After changing dataframe size and creating new pandas object I rewrite original pandas.dataframe in my class. But memory usage is very high even after significally reducing of initial table. Some code for short example (I didn't write process manager,see task manager):
import time, string, pandas, numpy, gc
class temp_class ():
def __init__(self, nrow = 1000000, ncol = 4, timetest = 5):
self.nrow = nrow
self.ncol = ncol
self.timetest = timetest
def createDataFrame(self):
print('Check memory before dataframe creating')
time.sleep(self.timetest)
self.df = pandas.DataFrame(numpy.random.randn(self.nrow, self.ncol),
index = numpy.random.randn(self.nrow), columns = list(string.letters[0:self.ncol]))
print('Check memory after dataFrame creating')
time.sleep(self.timetest)
def changeSize(self, from_ = 0, to_ = 100):
df_new = self.df[from_:to_].copy()
print('Check memory after changing size')
time.sleep(self.timetest)
print('Check memory after deleting initial pandas object')
del self.df
time.sleep(self.timetest)
print('Check memory after deleting copy of reduced pandas object')
del df_new
gc.collect()
time.sleep(self.timetest)
if __name__== '__main__':
a = temp_class()
a.createDataFrame()
a.changeSize()
Before dataframe creating I have approx. 15 mb of memory usage
After creating - 67mb
After changing size - 67 mb
After deleting original dataframe - 35mb
After deleting reduced table - 31 mb.
16 mb?
I use python 2.7.2(x32) on Windows 7 (x64) machine, pandas.version is 0.7.3. numpy.version is 1.6.1