3

I am having a hdf5 file with several keys. I want to update the values of all the keys after doing some adjustment. How to do that?

Here is my code:

hf=h5py.File(fileName,"r+")
keys=hf.keys()
for i in keys:
    df=hf[i].value
    df=df-np.mean(df)
hf.close()

But then when I read this file in "r" mode, it shows still the same mean which means it hasn't updated. Any idea where it might be going wrong?

The above issue is fixed in the answer provided. However, I am now trying to calibrate the resultant data. The code which I am using is:

data = hf[key][...]
hf[key][...] = data*calibration_factor

Just for the simple calibration. However, its resulting to all zeroes in hf[key][...]. Any possible solution for this? I am struggling a lot on this,thanks for understanding.

SudipM
  • 416
  • 7
  • 14
  • Thanks, I saw that earlier, just now saw again, yes, its almost same, but could you please guide what that [...] means cause i have to use it twice, as i have to use it multiple times. – SudipM Jul 09 '18 at 13:49
  • Does this help: https://stackoverflow.com/questions/42190783/what-does-three-dots-in-python-mean-when-indexing-what-looks-like-a-number? – Tom de Geus Jul 09 '18 at 13:59
  • Thanks, I saw that earlier, just now saw again, yes, its almost same, but could you please guide what that [...] means cause i have to use it twice, as i have to use it multiple times. as its not straight assign but df=df-np.mean(df), so how to use [...] in this case? – SudipM Jul 09 '18 at 14:00
  • I have poster a complete answer with an explanation. Maybe just a comment: I think that by 'key' you actually mean 'dataset'. – Tom de Geus Jul 09 '18 at 14:27

1 Answers1

2

As is indicated in this answer: you want to assign values, not create a dataset. The latter would in this case in any case not work, as the datasets exists.

To assign values you can use Python ellipsis indexing (the ... indexing):

import h5py
import numpy as np

# create some file
# ----------------

hf = h5py.File('example.hdf5', 'w')

hf['/foo'] = np.random.random(100)
hf['/bar'] = np.random.random(100) + 10.

hf.close()

# set all datasets to have a zero mean
# ------------------------------------

hf = h5py.File('example.hdf5', 'r+')

for key in hf:

  data = hf[key][...]

  hf[key][...] = data - np.mean(data)

hf.close()

# verify
# ------

hf = h5py.File('example.hdf5', 'r')

for key in hf:

  print(key, hf[key][...].shape, np.mean(hf[key][...]))

hf.close()

How the ... exactly works depends on the class/library that you are using, in particular of how __getitem__ is implemented. For h5py you can consult the documentation which gives some insight, look at this discussion, or search for other good references (that surely exist). What I can tell you in this context is that the ... can be used to read and assign values of a dataset. This is illustrated above, where ... has been used as an alternative to your .value operator.

What went wrong in your example is the you assumed that df was a pointer to the data. This is not the case, it is a copy. In fact df lives on the memory, while the data stored in the file lives on this disk. So modify df will not do anything to your file (as is wanted behaviour is many cases). You need to actively modify the file's content, as is being done is this answer.

A final note: This code is very simple. For example, it works only for files without groups. If you want to be more general you would have to include some check(s).

Tom de Geus
  • 5,625
  • 2
  • 33
  • 77
  • Thanks a lot, it worked that way, I am facing another difficulty similar data = hf[key][...] hf[key][...] = data*calibration_factor, for the simple calculation data is to be calibrated by a factor. Unfortunately this results to all zeroes in the resultant dataset, hf[key][...]. However, if I assign it to any other variable it works fine. Any idea whats the issue? – SudipM Jul 09 '18 at 18:05