I have a data generator which works but is extremely slow to read data from a 200k image dataset.
I use:
X=f[self.trName][idx * self.batch_size:(idx + 1) * self.batch_size]
after having opened the file with f=h5py.File(fileName,'r')
It seems to be slower as the idx is large (sequential access?) but in any case it is at least 10 seconds (sometimes >20 sec) to read a batch, which is far too slow (moreover reading from an SSD!)
Any ideas?
The dataset is taking 50.4 GB on disk (compressed) and its shape is: (210000, 2, 128, 128)
(this is the shape of the trainingset, the targets have the same shape, and are stored as another dataset inside this same .h5 file)