2

I have many HDF5 files in a directory and I want to concatenate all of them. I tried the following:

from glob import iglob
import shutil
import os

PATH = r'C:\Dropbox\data_files'

destination = open('data.h5','wb')
for filename in iglob(os.path.join(PATH, '*.h5')):
    shutil.copyfileobj(open(filename, 'rb'), destination)
destination.close()

However, this only creates an empty file. Each HDF5 file contains two datasets, but I only care about taking the second one (which is named the same thing in each) and adding it to a new file.

Is there a better way of concatenating HDF files? Is there a way to fix my method?

okarin
  • 239
  • 2
  • 11
  • Its not so straightforward. Take a look here http://stackoverflow.com/questions/5346589/concatenate-a-large-number-of-hdf5-files and here: http://stackoverflow.com/questions/18492273/combining-hdf5-files – Trond Kristiansen Jul 03 '14 at 19:18
  • I've looked at that post but am not sure how exactly that method works. – okarin Jul 03 '14 at 19:22
  • Did you ever solve this? If you did, could you post a self-answer? – KobeJohn Mar 01 '16 at 07:08
  • If not, does [this](https://gist.github.com/zonca/8e0dda9d246297616de9) (from [this question](http://stackoverflow.com/q/5346589/377366)) solve it? – KobeJohn Mar 01 '16 at 07:31

1 Answers1

1

You can combine ipython with h5py module and h5copy tool.

Once installed h5copy ahd h5py just open the ipython console in the folder where all your .h5 files are stored and use this code to merge them in a output.h5 file:

import h5py
import os 
d_names = os.listdir(os.getcwd())
d_struct = {} #Here we will store the database structure
for i in d_names:
   f = h5py.File(i,'r+')
   d_struct[i] = f.keys()
   f.close()

for i in d_names:
    for j  in d_struct[i]:
          !h5copy -i '{i}' -o 'output.h5' -s {j} -d {j}
Tim Whitcomb
  • 10,447
  • 3
  • 35
  • 47
G M
  • 20,759
  • 10
  • 81
  • 84