I have an original hdf5 file with a dataset thats of shape (3737, 224, 224, 3) and it was not extendable. i.e. no maxshape argument passed during its creation.
I decided to create a new hdf5 file and create the dataset with maxshape=(None, 224, 224, 3) such that I can resize it later. I then just copied the dataset from the original hdf5 to this new one, and saved.
The contents of the two hdf5 are exactly the same. I then tried to read all the data back, and I found significant performance degradation for the resizable version.
Original: CPU times: user 660 ms, sys: 2.58 s, total: 3.24 s Wall time: 6.08 s
Resizable: CPU times: user 18.6 s, sys: 4.41 s, total: 23 s Wall time: 49.5 s
Thats almost 10 times as slow. Is this to be expected? The file size difference is only less than 2 mb. Are there optimization tips/tricks I need to be aware of?