I'm not familiar with pandas, so can't help there. This can be done with h5py or pytables. As @hpaulj mentioned, the process reads the dataset into a numpy array then writes to a HDF5 dataset with h5py. The exact process depends on the maxshape attribute (it controls if the dataset can be resized or not).
I created examples to show both methods (fixed size or resizeable dataset). The first method creates a new file3 that combines the values from file1 and file2. The second method adds the values from file2 to file1e (that is resizable). Note: code to create the files used in the examples is at the end.
I have a longer answer on SO that shows all the ways to copy data.
See this Answer: How can I combine multiple .h5 file?
Method 1: Combine datasets into a new file
Required when the datasets were not created with maxshape=
parameter
with h5py.File('file1.h5','r') as h5f1, \
h5py.File('file2.h5','r') as h5f2, \
h5py.File('file3.h5','w') as h5f3 :
print (h5f1['ds_1'].shape, h5f1['ds_1'].maxshape)
print (h5f2['ds_2'].shape, h5f2['ds_2'].maxshape)
arr1_a0 = h5f1['ds_1'].shape[0]
arr2_a0 = h5f2['ds_2'].shape[0]
arr3_a0 = arr1_a0 + arr2_a0
h5f3.create_dataset('ds_3', dtype=h5f1['ds_1'].dtype,
shape=(arr3_a0,3), maxshape=(None,3))
xfer_arr1 = h5f1['ds_1']
h5f3['ds_3'][0:arr1_a0, :] = xfer_arr1
xfer_arr2 = h5f2['ds_2']
h5f3['ds_3'][arr1_a0:arr3_a0, :] = xfer_arr2
print (h5f3['ds_3'].shape, h5f3['ds_3'].maxshape)
Method 2: Appended file2 dataset to file1 dataset
The datasets in file1e must be created with maxshape=
parameter
with h5py.File('file1e.h5','r+') as h5f1, \
h5py.File('file2.h5','r') as h5f2 :
print (h5f1['ds_1e'].shape, h5f1['ds_1e'].maxshape)
print (h5f2['ds_2'].shape, h5f2['ds_2'].maxshape)
arr1_a0 = h5f1['ds_1e'].shape[0]
arr2_a0 = h5f2['ds_2'].shape[0]
arr3_a0 = arr1_a0 + arr2_a0
h5f1['ds_1e'].resize(arr3_a0,axis=0)
xfer_arr2 = h5f2['ds_2']
h5f1['ds_1e'][arr1_a0:arr3_a0, :] = xfer_arr2
print (h5f1['ds_1e'].shape, h5f1['ds_1e'].maxshape)
Code to create the example files used above:
import h5py
import numpy as np
arr1 = np.array([[ 1, 3, 5 ],
[ 5, 4, 9 ],
[ 6, 8, 0 ],
[ 7, 2, 5 ],
[ 2, 1, 2 ]] )
with h5py.File('file1.h5','w') as h5f:
h5f.create_dataset('ds_1',data=arr1)
print (h5f['ds_1'].maxshape)
with h5py.File('file1e.h5','w') as h5f:
h5f.create_dataset('ds_1e',data=arr1, shape=(5,3), maxshape=(None,3))
print (h5f['ds_1e'].maxshape)
arr2 = np.array([[ 6, 1, 9 ],
[ 8, 2, 7 ]] )
with h5py.File('file2.h5','w') as h5f:
h5f.create_dataset('ds_2',data=arr2)