I'm reading in hdf5 data sets into memory via:
import h5py
with h5py.File('file.hdf5') as f:
a = f['data'][:]
where 'a' can have up to 100 million entires. How can I query exactly how much memory in MB/GB that this list is taking up?
I'm reading in hdf5 data sets into memory via:
import h5py
with h5py.File('file.hdf5') as f:
a = f['data'][:]
where 'a' can have up to 100 million entires. How can I query exactly how much memory in MB/GB that this list is taking up?
h5py
loads (most) values as numpy
arrays. An array has a shape
attribute, and dtype
.
For an array that I happen to have in my Ipython session, I can get these attributes:
In [211]: X.shape,X.dtype
Out[211]: ((51, 13), dtype('float64'))
In [212]: X.size
Out[212]: 663
In [213]: X.size, X.itemsize
Out[213]: (663, 8)
In [214]: X.nbytes
Out[214]: 5304
The Ipython whos
command also gives me this information:
X ndarray 51x13: 663 elems, type `float64`, 5304 bytes
X
also uses some memory to store attributes like this, but most of the memory use is in the data buffer, which in this case is 5304 bytes long.
h5py
might have some added information; I'd have to check it's docs. But these are the numpy
basics.
In the h5py docs I see that a DataSet has shape, size and dtype. I don't see nbytes or itemsize. You may have to infer those.
For a small sample file, I get (in an Ipython session)
In [262]: y
Out[262]: <HDF5 dataset "y": shape (10,), type "<i4">
In [265]: y1=f['y'][:]
And the whos
entries:
y Dataset <HDF5 dataset "y": shape (10,), type "<i4">
y1 ndarray 10: 10 elems, type `int32`, 40 bytes
y1
is an ndarray
with all the attributes I described. y
, unloaded, doesn't have the nbytes
, but that can be calculated from shape and dtype.