1

I have several .h5 files (one file per day) each containing one dataset. I tried to combine them to one large .h5 file using h5py package with the following code:

with h5py.File(combined_file_path, "w") as combined_file:
    for file_path in file_paths:
        with h5py.File(file_path, "r") as data_file:
            data_file.copy('/', combined_file)

The variable file_paths is a list containing paths to all .h5 files. I get the following error in the last line: ValueError: No destination name specified (no destination name specified). What is the actual problem with the last code line and how can it be fixed?

vk21
  • 11
  • 2
  • 2
    Does this answer your question? [How to copy a dataset object to a different hdf5 file using pytables or h5py?](https://stackoverflow.com/questions/53455713/how-to-copy-a-dataset-object-to-a-different-hdf5-file-using-pytables-or-h5py) – Matt Pitkin Jul 04 '23 at 14:40
  • I can not get it working as I have some troubles which paths are to be specified in line `f_src.copy(f_src["/path/to/DataSet"],f_dest["/another/path"],"DataSet")`. – vk21 Jul 04 '23 at 15:01
  • ...I get this error: `KeyError: "Unable to open object"` – vk21 Jul 04 '23 at 15:11
  • Where did you find the documentation for this `h5py.File` `copy` method? – hpaulj Jul 04 '23 at 18:39

1 Answers1

0

There are multiple approaches to "copy" data between files. I used quotes, because you way not want to "copy" the data -- the target file can grow very large. Also the approach depends on the dataset names. If the dataset name in each daily file has the same name, then you will need a new dataset naming convention in the merged file.

I wrote an extended discussion that explains the various approaches here: How can I combine multiple .h5 file? There are multiple approaches:

  • Create External Links (Method 1)
  • Copy Data 'as-is' (Methods 2a and 2b)
  • Merge all data into 1 Fixed size Dataset (Method 3a)
  • Merge all data into 1 Resizeable Dataset (Method 3b)

For your scenario, I prefer external links. That way you maintain the daily files, but can access using the external links in the "merged" file.

kcw78
  • 7,131
  • 3
  • 12
  • 44
  • Thanks for your answer. I indeed want to copy the data of every .h5 file into one large file even if the file gets very large. What would you prefer then? Afterwards, I read in the file as a pandas dataframe. As their is a time tag in the data I do not need to maintain any daily files structure etc. – vk21 Jul 04 '23 at 15:32
  • The "best method" depends on your workflow. As I mentioned, I like external links. Method 2a using the h5py object `.copy()` command is the next easiest to implement. – kcw78 Jul 04 '23 at 17:33