0

I am reading in a chunk of data from a pytables.Table (version 3.1.1) using the read_where method from a big hdf5 file. The resulting numpy array has about 420 MB, however the memory consumption of my python process has gone up by 1.6GB during the read_where call and the memory is not released after the call is finished. Even deleting the array, closing the file and deleting the hdf5 file handle does not free the memory.

How can I free this memory again?

Ben K.
  • 1,160
  • 6
  • 20

2 Answers2

1

The huge memory consumption is due to the fact that python implements a lot of stuffs around the data to facilitate its manipulation.

You've got a good explanation of why the memory use is maintained here and there (found on this question). A good workaround would be to open and manipulate your table in a subprocess with the multiprocessing module

Community
  • 1
  • 1
Math
  • 2,399
  • 2
  • 20
  • 22
0

We would need more context on the details of your Table object, like how large is it and the chunk size. How HDF5 handles chunking is probably one of the largest responsibles for hugging memory in this case.

My advice is to have a thorough read of this: http://pytables.github.io/usersguide/optimization.html#understanding-chunking and experiment with different chunksizes (typically making them larger).

Francesc
  • 366
  • 2
  • 3
  • I do provide a guess during Table creation :expectedrows =10**9 lines in the Table, but the actual number may vary by a factor of 10. I will look into the docs more deeply for this. However, what I did not see at is how to free up memory after the table has been closed. – Ben K. Jun 19 '14 at 08:04