5

I'm using pandas to do a ring buffer, but the memory use keeps growing. what am I doing wrong?

Here is the code (edited a little from the first post of the question):

import pandas as pd
import numpy as np
import resource


tempdata = np.zeros((10000,3))
tdf = pd.DataFrame(data=tempdata, columns = ['a', 'b', 'c'])

i = 0
while True:
    i += 1
    littledf = pd.DataFrame(np.random.rand(1000, 3), columns = ['a', 'b', 'c'])
    tdf = pd.concat([tdf[1000:], littledf], ignore_index = True)
    del littledf
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    if i% 1000 == 0:
        print 'total memory:%d kb' % (int(currentmemory)/1000)

this is what I get:

total memory:37945 kb
total memory:38137 kb
total memory:38137 kb
total memory:38768 kb
total memory:38768 kb
total memory:38776 kb
total memory:38834 kb
total memory:38838 kb
total memory:38838 kb
total memory:38850 kb
total memory:38854 kb
total memory:38871 kb
total memory:38871 kb
total memory:38973 kb
total memory:38977 kb
total memory:38989 kb
total memory:38989 kb
total memory:38989 kb
total memory:39399 kb
total memory:39497 kb
total memory:39587 kb
total memory:39587 kb
total memory:39591 kb
total memory:39604 kb
total memory:39604 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39608 kb
total memory:39612 kb

not sure if it's related to this:

https://github.com/pydata/pandas/issues/2659

Tested on MacBook Air with Anaconda Python

Fra
  • 4,918
  • 7
  • 33
  • 50
  • Weirdly, I copy and paste this code and no leak. 0.12 and 0.13rc. – Andy Hayden Dec 22 '13 at 07:32
  • I added what I get (and changed the code a little bit). do you get the same or different? – Fra Dec 22 '13 at 08:00
  • I get "total memory:59 kb" all the way down. Perhaps OS/setup, maybe add some more details :s. Could be better as a sep github issue though. Have you tried adding the gc.collect as in the other issue? – Andy Hayden Dec 22 '13 at 08:05
  • You may be right. I'm testing it on a Ubuntu server and memory seems to be staying the same...Weird – Fra Dec 22 '13 at 08:17
  • 1
    To locate the leak, try using [memory_profiler](https://pypi.python.org/pypi/memory_profiler) – albus_c Apr 01 '14 at 09:47

1 Answers1

1

Instead of using concat, why not update the DataFrame in place? i % 10 will determine which 1000 row slot you write to each update.

i = 0
while True:
    i += 1
    tdf.iloc[1000*(i % 10):1000+1000*(i % 10)] = np.random.rand(1000, 3)
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    if i% 1000 == 0:
        print 'total memory:%d kb' % (int(currentmemory)/1000)
mtadd
  • 2,495
  • 15
  • 18