I'm trying to store a list of variable length arrays in an HDF file with the following procedure:
phn_mfccs = []
# Import wav files
for waveform in files:
phn_mfcc = mfcc(waveform) # produces a variable length multidim array of the shape (x, 13, 1)
# Add MFCC and label to dataset
# phn_mfccs has dimension (len(files),)
# phn_mfccs[i] has variable dimension ([# of frames in ith segment] (variable), 13, 1)
phn_mfccs.append(phn_mfcc)
dt = h5py.special_dtype(vlen=np.dtype('float64'))
mfccs_out.create_dataset('phn_mfccs', data=phn_mfccs, dtype=dt)
It seems like my datatypes aren't working out though -- instead of each element of the mfccs_out dataset containing a multidimensional array, it contains just a 1D array. e.g. if the first phn_mfcc
I append originally has dimension (59,13,1)
, mfccs_out['phn_mfccs'][0]
has dimension (59,)
.
I suspect it is because I'm just using a float64 datatype, and I need something else for an array of arrays? If I don't specify the dataset or try to use dtype='O'
, though, it spits out an error like "Object dtype 'O' has no native HDF equivalent."
Ideally, what I'd like is for mfccs_out['phn_mfccs'][i]
to contain the ith phn_mfcc
that I appended to the list phn_mfccs
.