How to avoid high memory usage in pytables?

Question

I am reading in a chunk of data from a pytables.Table (version 3.1.1) using the read_where method from a big hdf5 file. The resulting numpy array has about 420 MB, however the memory consumption of my python process has gone up by 1.6GB during the read_where call and the memory is not released after the call is finished. Even deleting the array, closing the file and deleting the hdf5 file handle does not free the memory.

How can I free this memory again?

score 1 · Accepted Answer · edited May 23 '17 at 10:33

1

The huge memory consumption is due to the fact that python implements a lot of stuffs around the data to facilitate its manipulation.

You've got a good explanation of why the memory use is maintained here and there (found on this question). A good workaround would be to open and manipulate your table in a subprocess with the multiprocessing module

edited May 23 '17 at 10:33

Community

1
1

answered Jun 11 '14 at 08:57

Math

2,399
2
20
22

score 0 · Answer 2 · answered Jun 18 '14 at 16:57

0

We would need more context on the details of your Table object, like how large is it and the chunk size. How HDF5 handles chunking is probably one of the largest responsibles for hugging memory in this case.

My advice is to have a thorough read of this: http://pytables.github.io/usersguide/optimization.html#understanding-chunking and experiment with different chunksizes (typically making them larger).

answered Jun 18 '14 at 16:57

Francesc

366
2
3

I do provide a guess during Table creation :expectedrows =10**9 lines in the Table, but the actual number may vary by a factor of 10. I will look into the docs more deeply for this. However, what I did not see at is how to free up memory after the table has been closed. – Ben K. Jun 19 '14 at 08:04

How to avoid high memory usage in pytables?

2 Answers2