I am trying to optimize deep learning computer vision pipelines for HPC architectures that have high performance parallel IO. Storing large numbers of files in a single directory is an anti-pattern on such systems. Much better IO performance will be achieved if I can collect the images and write them into a single large file.
What file formats are best suited for this task? Do Python libraries exist to write large numbers of images files into a single binary file format? I came across GEIS files which look fit for purpose but can not find examples demonstrating usage.