14

I am using pandas.DataFrame in a multi-threaded code (actually a custom subclass of DataFrame called Sound). I have noticed that I have a memory leak, since the memory usage of my program augments gradually over 10mn, to finally reach ~100% of my computer memory and crash.

I used objgraph to try tracking this leak, and found out that the count of instances of MyDataFrame is going up all the time while it shouldn't : every thread in its run method creates an instance, makes some calculations, saves the result in a file and exits ... so no references should be kept.

Using objgraph I found that all the data frames in memory have a similar reference graph :

enter image description here

I have no idea if that's normal or not ... it looks like this is what is keeping my objects in memory. Any idea, advice, insight ?

ali_m
  • 71,714
  • 23
  • 223
  • 298
sebpiq
  • 7,540
  • 9
  • 52
  • 69
  • Is it possible to include a short code snippet to replicate this? – Andy Hayden Jan 08 '13 at 21:12
  • Did you try to run manually the garbage collector ? If you have circular references, this could be required to release the memory. `import gc; gc.collect()` – lgautier Jan 08 '13 at 21:18

1 Answers1

15

Confirmed that there's some kind of memory leak going on in the indexing infrastructure. It's not caused by the above reference graph. Let's move the discussion to GitHub (SO is for Q&A):

https://github.com/pydata/pandas/issues/2659

EDIT: this actually appears to not be a memory leak at all, but has to do with the OS memory allocation issues perhaps. Please have a look at the github issue for more information

Wes McKinney
  • 101,437
  • 32
  • 142
  • 108
  • Ok ... collecting manually with gc seems to actually do the trick. I will confirm when I'm sure about that. – sebpiq Jan 09 '13 at 12:11
  • is there a reason why gen 2 gc was not running automagically by the Python runtime? – Fil Nov 27 '14 at 02:34