4

I have identified another memory leaking with Pandas through this piece of code:

  import pandas as pd
  store = pd.HDFStore(hdf[0])
  par = store[hdf[1]][:, hdf[2]]
  store.close()

  for pixel in pix_fac.itervalues():
    fac = pixel[4][::2]
    meas = array(par.loc[fac])

100% of computer memory is reach in some seconds, freezing everything. I use a Debian 2.30, Intel i5, 8 GB RAM.

I believe this is related with following questions:

memory leak in creating a buffer with pandas?

Memory leak using pandas dataframe

Someone knows how may I deal with this leaking? I really have to use .loc method to retrieve specific parameters on each iteration.

Community
  • 1
  • 1
phasselmann
  • 465
  • 2
  • 6
  • 17

1 Answers1

2

you can try gc.collect() every once in a while.

better yet, do: par = par.T, and select via par[fac]. This way you are not taking a cross section every time, which by definition will create a copy (and easy to just keep allocating memory) as you are keeping references to it.

even better would be to refactor this calculation to avoid this type of selection and vectorize it.

Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Actually, I discovered that I if I input float values into .loc, but the indexes are integer (or vice-versa), pandas doesn't return error, but built up memory. Strange behavior. – phasselmann Mar 26 '14 at 20:22
  • You shouldn't use float values as indexers, it works for now, but will be deprecated in the future. Really hard to see what you are doing w/o showing more code / structures. – Jeff Mar 26 '14 at 20:36
  • Yes, I see. Using Float as index as a mistake. I converted everything to integer. I saw the mistake by chance, that's why I did not highlight it in the question. – phasselmann Mar 27 '14 at 13:29