1

I'm reading in hdf5 data sets into memory via:

import h5py
with h5py.File('file.hdf5') as f:
    a = f['data'][:]

where 'a' can have up to 100 million entires. How can I query exactly how much memory in MB/GB that this list is taking up?

hpaulj
  • 221,503
  • 14
  • 230
  • 353
user2483176
  • 317
  • 1
  • 10
  • 21

1 Answers1

1

h5py loads (most) values as numpy arrays. An array has a shape attribute, and dtype.

For an array that I happen to have in my Ipython session, I can get these attributes:

In [211]: X.shape,X.dtype
Out[211]: ((51, 13), dtype('float64'))

In [212]: X.size
Out[212]: 663

In [213]: X.size, X.itemsize
Out[213]: (663, 8)

In [214]: X.nbytes
Out[214]: 5304

The Ipython whos command also gives me this information:

X   ndarray       51x13: 663 elems, type `float64`, 5304 bytes

X also uses some memory to store attributes like this, but most of the memory use is in the data buffer, which in this case is 5304 bytes long.

h5py might have some added information; I'd have to check it's docs. But these are the numpy basics.

In the h5py docs I see that a DataSet has shape, size and dtype. I don't see nbytes or itemsize. You may have to infer those.


For a small sample file, I get (in an Ipython session)

In [262]: y
Out[262]: <HDF5 dataset "y": shape (10,), type "<i4">
In [265]: y1=f['y'][:]

And the whos entries:

y             Dataset     <HDF5 dataset "y": shape (10,), type "<i4">
y1            ndarray     10: 10 elems, type `int32`, 40 bytes

y1 is an ndarray with all the attributes I described. y, unloaded, doesn't have the nbytes, but that can be calculated from shape and dtype.

hpaulj
  • 221,503
  • 14
  • 230
  • 353