I am training PyTorch models on various datasets. The datasets up to this point have been images so I can just read them on the fly when needed using cv2 or PIL which is fast.
Now I am presented with a dataset of tensor objects of shape [400, 400, 8]. In the past I have tried to load these objects using PyTorch and NumPy's built-in tensor reading operations but these are generally much slower than reading images.
The objects are currently stored in h5py compressed files where there are ~800 per file. My plan was to save the objects individually in some format and then read them on the fly but I am unsure of what format to save them in which is fastest.
I would like to avoid keeping them all in memory as I believe the memory requirement would be too high.