As is indicated in this answer: you want to assign values, not create a dataset. The latter would in this case in any case not work, as the datasets exists.
To assign values you can use Python ellipsis indexing (the ...
indexing):
import h5py
import numpy as np
# create some file
# ----------------
hf = h5py.File('example.hdf5', 'w')
hf['/foo'] = np.random.random(100)
hf['/bar'] = np.random.random(100) + 10.
hf.close()
# set all datasets to have a zero mean
# ------------------------------------
hf = h5py.File('example.hdf5', 'r+')
for key in hf:
data = hf[key][...]
hf[key][...] = data - np.mean(data)
hf.close()
# verify
# ------
hf = h5py.File('example.hdf5', 'r')
for key in hf:
print(key, hf[key][...].shape, np.mean(hf[key][...]))
hf.close()
How the ...
exactly works depends on the class/library that you are using, in particular of how __getitem__
is implemented. For h5py you can consult the documentation which gives some insight, look at this discussion, or search for other good references (that surely exist). What I can tell you in this context is that the ...
can be used to read and assign values of a dataset. This is illustrated above, where ...
has been used as an alternative to your .value
operator.
What went wrong in your example is the you assumed that df
was a pointer to the data. This is not the case, it is a copy. In fact df
lives on the memory, while the data stored in the file lives on this disk. So modify df
will not do anything to your file (as is wanted behaviour is many cases). You need to actively modify the file's content, as is being done is this answer.
A final note: This code is very simple. For example, it works only for files without groups. If you want to be more general you would have to include some check(s).