0

I am trying to optimize deep learning computer vision pipelines for HPC architectures that have high performance parallel IO. Storing large numbers of files in a single directory is an anti-pattern on such systems. Much better IO performance will be achieved if I can collect the images and write them into a single large file.

What file formats are best suited for this task? Do Python libraries exist to write large numbers of images files into a single binary file format? I came across GEIS files which look fit for purpose but can not find examples demonstrating usage.

davidrpugh
  • 4,363
  • 5
  • 32
  • 46
  • 1
    you could look at hdf5 or lmdb – warped Apr 25 '19 at 06:47
  • I use uncompressed zip for this. It's very simple, widely supported, and there's a zip64 standard for huge files. I use libgsf to write these things and it seems fine, but perhaps there's a better library. libgsf has a Python binding via gobject. – jcupitt Apr 26 '19 at 16:46

0 Answers0