I am having big image datasets to train CNN's on. Since I cannot load all the images into my RAM I plan to dump them into a HDF5 file (with h5py) and then iterate over the set batchwise, as suggested in
Most efficient way to use a large data set for PyTorch?
I tried creating an own dataset for every picture, located in the same group, which is very fast. But I could not figure out to iterate over all datasets in the group, except for accessing the set by its name. As an alternative I tried putting all the images itereatively into one dataset by extending its shape, according to
How to append data to one specific dataset in a hdf5 file with h5py and
incremental writes to hdf5 with h5py
but this is very slow. Is there a faster way to create a HDF5 dataset to iterate over?