How to write and append to h5 file multiple times in Python?

Question

I am trying to write datasets to h5 file in the following way:

fpath = 'path-to-/data.h5'
with h5py.File(fpath,'w') as hf:
    hf.create_dataset('a', data=a)

Then I am appending to the file with more data in the same code:

with h5py.File(fpath,'a') as hf:
    dset = hf.create_dataset('b',(nrow,1),maxshape=(nrow,None),chunks=(nrow,1))
    for i in range(ncol):
        dset[:,-1:] = b
        if i+1 < ncol:
            dset.resize(dset.shape[1]+1,axis=1)

I get the following error against the second operation (append):

OSError: Unable to create file (unable to open file: name = 'path-to-/data.h5', 
    errno = 2, error message = 'Aucun fichier ou dossier de ce type',
    flags = 13, o_flags = 242)

When I check the directory, the file path-to-/data.h5 exists but without the appended datasets (checked with list(hf.keys())).

To make all of this work, currently I am writing everything in one step and not using the with statement (as suggested in the question EDIT here).

hf = h5py.File(fpath,'w')
hf.create_dataset('a', data=a)
dset = hf.create_dataset('b',(nrow,1),maxshape=(nrow,None),chunks=(nrow,1))
for i in range(ncol):
    dset[:,-1:] = b
    if i+1 < ncol:
        dset.resize(dset.shape[1]+1,axis=1)
hf.close()

Here also, if I delete the written file and run the code again, it gives the same error as above and it only runs when I make a change in the file name (e.g. 'data_1.h5'). I don't understand this part as I anticipated that the operation h5py.File(fpath,'w') would be independent from existence or non-existence of the file.

To summarise, the only way I found to make the code work is by using the second approach (write without append) and don't alter the file (rename or move) that is generated.

I could not find it here, but is there a way to force write and append to a h5 file irrespective of it's existence or previous calls?

Did you use `hf.close()` in the first routine to create `dataset a`? It's not shown, so I have to ask. Behavior without `hf.close()` is unpredictable. — kcw78, May 01 '19 at 13:08
I recall that when we use the `with` statement, `hf.close()` is not necessary, as mentioned [here](https://www.pythonforthelab.com/blog/how-to-use-hdf5-files-in-python/#basic-saving-and-reading-data) — nish-ant, May 01 '19 at 13:18
Sorry. Yes, you are correct. To confirm; did you check the h5 file after the first operation to verify it exists and has `dataset a`? — kcw78, May 01 '19 at 13:37
Yes, I checked it and the dataset `a` is stored. The error appears in 2 ways - (1) if I call the file in append mode, and (2) if I rename/move/delete the original file and run the code again (it fails unless I run the code with a different filename than the ones used before). — nish-ant, May 01 '19 at 13:47
Your error message indicates a file access problem. What happens if you run the second code block (to create `dataset b`) with the 'w' option instead of the 'a' option? — kcw78, May 01 '19 at 13:52
It works fine for the second block but as I mentioned, the only issue there is that I don't understand why it breaks when I alter the written location/name of the file and run the code again. It runs correctly if and only if the file that was created in the first run is still there in the write path. This baffles me as in principle, the write mode should not care about the file and simply create a new file, or overwrite if the file is already existing. — nish-ant, May 01 '19 at 14:11
@nish-ant It remains a bit unclear to me what was and was was not resolved. Maybe you can edit your question? It would be best if you at the same time rework your question such that it is [minimal and reproducible](https://stackoverflow.com/help/mcve): some code that can be copy/pasted and tested. — Tom de Geus, May 01 '19 at 14:13
@TomdeGeus Thanks for your comment. I mentioned the unresolved issue in the last but second paragraph. I am not sure how I can present a reproducible code here as I working with large datasets over shared file systems. I will try to see if I can scale the data down locally to post a working code here. — nish-ant, May 01 '19 at 14:21
@nish-ant The size of the datasets does not seem to matter for your question. You could have some fictitious data like `[1,2,3]`. — Tom de Geus, May 01 '19 at 14:23

score 1 · Answer 1 · answered May 01 '19 at 18:02

@nish-ant, I created a simple MCVE to demonstrate the 'w' and 'a' options with 2 simple datasets. It replicates your process (as I understand it) in 1 program. First I open the file with 'w' option, close, then reopen with 'a' option. It works as expected. Review and compare to your code. Maybe it will help you identify the file access error.
I also successfully tested with these file options:
1. 'w' for 1; then 'r+' for 2
2. 'a' for 1; then 'a' for 2

import h5py
import numpy as np

#Create array_to_be_saved
arr1 = np.arange(18.).reshape(3,6)
arr2 = 2.0*arr1

fpath = 'SO_55936567_data.h5'
with h5py.File(fpath,'w') as h5f:
    h5f.create_dataset('a', data=arr1)

h5f.close()

with h5py.File(fpath,'a') as h5f:
    h5f.create_dataset('b', data=arr2)

h5f.close()

print ('done')

Thanks for the MCVE. I will test it on my end but from the preliminary glance, I think your code should work without error. I am anticipating that in my code, the file operations are not "clean" enough because of the additional `resize` of the datasets that I am doing during the append stage. Perhaps the `h5` file is not closing properly when I use the combination of `with` and `resize` and this throws an error when run the same code again. I can test this using your MCVE. — nish-ant, May 01 '19 at 21:12

How to write and append to h5 file multiple times in Python?

1 Answers1