0

What is the best format to save images to disk so that the loading speed is fastest ? I am guessing that .JPG format might be suboptimal, maybe some other format works better ?

my case: I have some 9k .JPG images (large 8000x8000 images) and am loading them (using PIL for the moment) for training using pytorch on 2x GPUs. I have a large batch size 256 and am noticing that the bottleneck is the loading of these images (which are then resized, augmented and so on)

Anyone knows of any benchmarks on different saving and loading techniques ?

AnarKi
  • 857
  • 1
  • 7
  • 27
  • You can always compress them more to reduce IO overhead. There's no "best" format, it's always about *tradeoffs*. Lower quality = faster loading. Higher quality = larger file size. – tadman Aug 30 '23 at 15:41
  • @tadman what do you mean when you say compress them ? just resize to something smaller ? – AnarKi Aug 30 '23 at 15:46
  • I mean JPEG has variable compression, you can trade file-size for quality. It's a "lossy" format meaning if you want them to be smaller, that is an option, but at some point the image will be trashed. Find a good setting here to use. – tadman Aug 30 '23 at 15:55
  • I would think that a direct bitmapped image format would be fastest. Sure, the file itself is bigger, but after reading it, you don't need much postprocessing. – John Gordon Aug 30 '23 at 17:05
  • Some ideas here https://stackoverflow.com/a/51822265/2836621 – Mark Setchell Aug 30 '23 at 21:50
  • You can use `libjpeg-turbo`, see https://pillow.readthedocs.io/en/stable/releasenotes/5.4.0.html#check-for-libjpeg-turbo – Mark Setchell Aug 30 '23 at 21:51
  • You can store your images on SSD. You could try multi-threading. – Mark Setchell Aug 30 '23 at 21:52

0 Answers0