3

If hardware is not a limiting factor, what's the fastest way to take a large amount of high-res jpeg images and downsize them all? For example, if I have a folder of 20,000 jpeg images that vary in aspect ratio, but are all fairly large (near 4k resolution), and I'd like to resize every image to 512x512.

I've tried Python pillow-simd with libjpegturbo and multiprocessing on a machine with a pretty beefy CPU and a V100 GPU (although not utilized I believe), and it still takes something like 90 minutes to complete this job.

Does anyone know of an image downsizing method that can take advantage of a powerful GPU or has some other significant speed optimizations? Or is this really the current state-of-the-art for image downsizing speed?

Austin
  • 6,921
  • 12
  • 73
  • 138
  • 1
    I show and benchmark several methods here https://stackoverflow.com/a/51822265/2836621 – Mark Setchell May 01 '19 at 18:35
  • 1
    Note also, that as you are doing some pretty significant size reduction, you should try and use *"shrink-on-load"* as it makes a massive difference... https://stackoverflow.com/a/32169224/2836621 – Mark Setchell May 01 '19 at 19:57
  • Hmm haven't seen that before I'll test it out – Austin May 01 '19 at 20:00
  • 1
    I have never tried it with Pillow but I think that's what the `Image.draft()` method is... https://pillow.readthedocs.io/en/3.1.x/reference/Image.html – Mark Setchell May 01 '19 at 20:02
  • Thanks I was just looking for something like that! – Austin May 01 '19 at 20:03
  • 2
    Note also, that if you use Python and GNU Parallel for parallelisation, you should write your Python to accept a whole list of filenames. That way, instead of starting a whole new Python interpreter for each image, you can use `parallel -X ...` and it will pass as many image names as possible to each Python process, so the cost of starting Python will be amortised over many, many images. – Mark Setchell May 01 '19 at 20:18
  • 1
    One issue is what you want to do about quality? You could just do blind sampling or you could average values. The former is faster but tends to give poorer results. – user3344003 May 06 '19 at 19:33

1 Answers1

0

I've done some of this heavy image processing in the past. There's a open-source framework called OpenCV (computer vision) that works with Python, C++ and Java. OpenCV uses matrices (MAT) to do all kinds of image manipulations -- resizing is a piece of cake. This should give you a rough idea.

The java version for the code might look something like this:

import static org.opencv.imgproc.Imgproc.*;
import static org.opencv.imgcodecs.Imgcodecs.imread;
import static org.opencv.imgcodecs.Imgcodecs.imwrite;

//loop over your array of files here
Mat src  =  imread(myFilePath);
Mat resizeimage = new Mat();
Size scaleSize = new Size(512,512);
resize(src, resizeimage, scaleSize , 0, 0, INTER_AREA);
imwrite("C:\\File\\input.jpg", resizeimage);

If you want to do super-high speed image processing, C++ will work better. I did real-time movie image processing using OpenCV w/ c++ and it had all the horse power needed to process at 36 frames / second. With Java, the output was 4 frames a second.

Brian
  • 3,653
  • 1
  • 22
  • 33
  • Is the C++ version significantly faster than the Python version? I thought the underlying code in the Python version was also written in C for performance. – Austin May 01 '19 at 19:28
  • I didn't try Python, but I know OpenCV supports it. Python is, clearly, a lot easier to program than c++, so starting with python is a good idea. If you don't get the performance you want, you'd have to move to c++ to get top speed. – Brian May 02 '19 at 14:47