38

I'm building a photo gallery in Python and want to be able to quickly generate thumbnails for the high resolution images.

What's the fastest way to generate high quality thumbnails for a variety of image sources?

Should I be using an external library like imagemagick, or is there an efficient internal way to do this?

The dimensions of the resized images will be (max size):

120x120
720x720
1600x1600

Quality is an issue, as I want to preserve as many of the original colors as possible and minimize compression artifacts.

Thanks.

ensnare
  • 40,069
  • 64
  • 158
  • 224
  • You can use Python Wand, which calls Imagemagick to do that. I cannot say if it is the fastest. Python OpenCV may be faster. – fmw42 Aug 12 '18 at 00:00
  • The [thumbnails](https://pypi.org/project/thumbnails/) is the most convenient one. You can use the Python API or the CLI in your automation scripts. – Artyom Vancyan May 10 '23 at 14:31

8 Answers8

65

I fancied some fun so I did some benchmarking on the various methods suggested above and a few ideas of my own.

I collected together 1000 high resolution 12MP iPhone 6S images, each 4032x3024 pixels and use an 8-core iMac.

Here are the techniques and results - each in its own section.


Method 1 - Sequential ImageMagick

This is simplistic, unoptimised code. Each image is read and a thumbnail is produced. Then it is read again and a different sized thumbnail is produced.

#!/bin/bash

start=$SECONDS
# Loop over all files
for f in image*.jpg; do
   # Loop over all sizes
   for s in 1600 720 120; do
      echo Reducing $f to ${s}x${s}
      convert "$f" -resize ${s}x${s} t-$f-$s.jpg
   done
done
echo Time: $((SECONDS-start))

Result: 170 seconds


Method 2 - Sequential ImageMagick with single load and successive resizing

This is still sequential but slightly smarter. Each image is only read one time and the loaded image is then resized down three times and saved at three resolutions. The improvement is that each image is read just once, not 3 times.

#!/bin/bash

start=$SECONDS
# Loop over all files
N=1
for f in image*.jpg; do
   echo Resizing $f
   # Load once and successively scale down
   convert "$f"                              \
      -resize 1600x1600 -write t-$N-1600.jpg \
      -resize 720x720   -write t-$N-720.jpg  \
      -resize 120x120          t-$N-120.jpg
   ((N=N+1))
done
echo Time: $((SECONDS-start))

Result: 76 seconds


Method 3 - GNU Parallel + ImageMagick

This builds on the previous method, by using GNU Parallel to process N images in parallel, where N is the number of CPU cores on your machine.

#!/bin/bash

start=$SECONDS

doit() {
   file=$1
   index=$2
   convert "$file"                               \
      -resize 1600x1600 -write t-$index-1600.jpg \
      -resize 720x720   -write t-$index-720.jpg  \
      -resize 120x120          t-$index-120.jpg
}

# Export doit() to subshells for GNU Parallel   
export -f doit

# Use GNU Parallel to do them all in parallel
parallel doit {} {#} ::: *.jpg

echo Time: $((SECONDS-start))

Result: 18 seconds


Method 4 - GNU Parallel + vips

This is the same as the previous method, but it uses vips at the command-line instead of ImageMagick.

#!/bin/bash

start=$SECONDS

doit() {
   file=$1
   index=$2
   r0=t-$index-1600.jpg
   r1=t-$index-720.jpg
   r2=t-$index-120.jpg
   vipsthumbnail "$file"  -s 1600 -o "$r0"
   vipsthumbnail "$r0"    -s 720  -o "$r1"
   vipsthumbnail "$r1"    -s 120  -o "$r2"
}

# Export doit() to subshells for GNU Parallel   
export -f doit

# Use GNU Parallel to do them all in parallel
parallel doit {} {#} ::: *.jpg

echo Time: $((SECONDS-start))

Result: 8 seconds


Method 5 - Sequential PIL

This is intended to correspond to Jakob's answer.

#!/usr/local/bin/python3

import glob
from PIL import Image

sizes = [(120,120), (720,720), (1600,1600)]
files = glob.glob('image*.jpg')

N=0
for image in files:
    for size in sizes:
      im=Image.open(image)
      im.thumbnail(size)
      im.save("t-%d-%s.jpg" % (N,size[0]))
    N=N+1

Result: 38 seconds


Method 6 - Sequential PIL with single load & successive resize

This is intended as an improvement to Jakob's answer, wherein the image is loaded just once and then resized down three times instead of re-loading each time to produce each new resolution.

#!/usr/local/bin/python3

import glob
from PIL import Image

sizes = [(120,120), (720,720), (1600,1600)]
files = glob.glob('image*.jpg')

N=0
for image in files:
   # Load just once, then successively scale down
   im=Image.open(image)
   im.thumbnail((1600,1600))
   im.save("t-%d-1600.jpg" % (N))
   im.thumbnail((720,720))
   im.save("t-%d-720.jpg"  % (N))
   im.thumbnail((120,120))
   im.save("t-%d-120.jpg"  % (N))
   N=N+1

Result: 27 seconds


Method 7 - Parallel PIL

This is intended to correspond to Audionautics' answer, insofar as it uses Python's multiprocessing. It also obviates the need to re-load the image for each thumbnail size.

#!/usr/local/bin/python3

import glob
from PIL import Image
from multiprocessing import Pool 

def thumbnail(params): 
    filename, N = params
    try:
        # Load just once, then successively scale down
        im=Image.open(filename)
        im.thumbnail((1600,1600))
        im.save("t-%d-1600.jpg" % (N))
        im.thumbnail((720,720))
        im.save("t-%d-720.jpg"  % (N))
        im.thumbnail((120,120))
        im.save("t-%d-120.jpg"  % (N))
        return 'OK'
    except Exception as e: 
        return e 


files = glob.glob('image*.jpg')
pool = Pool(8)
results = pool.map(thumbnail, zip(files,range((len(files)))))

Result: 6 seconds


Method 8 - Parallel OpenCV

This is intended to be an improvement on bcattle's answer, insofar as it uses OpenCV but it also obviates the need to re-load the image to generate each new resolution output.

#!/usr/local/bin/python3

import cv2
import glob
from multiprocessing import Pool 

def thumbnail(params): 
    filename, N = params
    try:
        # Load just once, then successively scale down
        im = cv2.imread(filename)
        im = cv2.resize(im, (1600,1600))
        cv2.imwrite("t-%d-1600.jpg" % N, im) 
        im = cv2.resize(im, (720,720))
        cv2.imwrite("t-%d-720.jpg" % N, im) 
        im = cv2.resize(im, (120,120))
        cv2.imwrite("t-%d-120.jpg" % N, im) 
        return 'OK'
    except Exception as e: 
        return e 


files = glob.glob('image*.jpg')
pool = Pool(8)
results = pool.map(thumbnail, zip(files,range((len(files)))))

Result: 5 seconds

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • 2
    Nice comparison, Mark – fmw42 Aug 13 '18 at 15:54
  • 3
    This answer is vastly superior to all of the above (and accepted) answers – Beracah Nov 06 '18 at 14:17
  • Did you use vanilla PIL or Pillow-simd? – Austin May 01 '19 at 19:39
  • 1
    @Austin Vanilla PIL. – Mark Setchell May 01 '19 at 19:46
  • Thanks, yeah this lines up with what I've found. Using Pillow-simd+multiprocessing+libjpegturbo I'm averaging around 1.2sec for a (3840,2160)->(512,512) resize. Still a bit painful for several thousand images :/ – Austin May 01 '19 at 19:57
  • Hi Mark, one pother point might be that vips and imagemagick are using adaptive Lanczos3, so about a 16 point kernel for the 1600 output case. OpenCV resize() defaults to simple bilinear, so it has a speed advantage and will produce markedly worse quality output. I'd expect bad moire effects, especially at the smaller sizes. – jcupitt Aug 27 '19 at 14:51
  • ... I added an answer showing the cv2 aliasing problems. – jcupitt Aug 27 '19 at 15:35
  • @jcupitt Thank you for the heads-up. You make a good point - as always. – Mark Setchell Aug 27 '19 at 18:10
  • Does PIL/Pillow release the GIL? If so, I wonder how threading would do since it has lower overhead than multiprocessing. It would be easy enough with `multiprocessing.dummy` – Justin Winokur Dec 31 '19 at 18:00
  • I ran method 7 and method 8 on 12,779 images with a single thumbnail of size 320 x 320. Method 7 takes 207.61 seconds, while method 8 takes 930.33 seconds. I wonder if I'm the only one with such disparity. – nikhilweee Oct 16 '20 at 14:02
  • Do all these techniques respect aspect ratios, or does one need to do something to take care of that? – Nikhil VJ Jan 18 '22 at 15:28
  • how about cropped square thumbnails? – Nikhil VJ Jan 18 '22 at 15:31
31

You want PIL it does this with ease

from PIL import Image
sizes = [(120,120), (720,720), (1600,1600)]
files = ['a.jpg','b.jpg','c.jpg']

for image in files:
    for size in sizes:
        im = Image.open(image)
        im.thumbnail(size)
        im.save("thumbnail_%s_%s" % (image, "_".join(size)))

If you desperately need speed. Then thread it, multiprocess it or get another language.

idbrii
  • 10,975
  • 5
  • 66
  • 107
Jakob Bowyer
  • 33,878
  • 8
  • 76
  • 91
  • 3
    Latest version of PIL no longer supports `import Image` you should instead use `from PIL import Image` – Joakim Nov 10 '14 at 23:23
  • 1
    Also, this code will only save 3 thumbnails although it will generate all 9 thumbnails (you probably have to use `thumbnail_%s_%s-%s" % (image, size[0], size[1])`). – Matt3o12 Feb 18 '15 at 21:50
  • Seems odd to load the same high-res image 3 times from disk to generate 3 thumbnails. Why not load the high-res image once, scale down to 1600, write, scale down to 720, write, scale down to 120 and write? Surely has to be faster. – Mark Setchell Aug 11 '18 at 22:35
14

A little late to the question (only a year!), but I'll piggy backing on the "multiprocess it" part of @JakobBowyer's answer.

This is a good example of an embarrassingly parallel problem, as the main bit of code doesn't mutate any state external to itself. It simply reads an input, performs its computation and saves the result.

Python is actually pretty good at these kinds of problems thanks to the map function provided by multiprocessing.Pool.

from PIL import Image
from multiprocessing import Pool 

def thumbnail(image_details): 
    size, filename = image_details
    try:
        im = Image.open(filename)
        im.thumbnail(size)
        im.save("thumbnail_%s" % filename)
        return 'OK'
    except Exception as e: 
        return e 

sizes = [(120,120), (720,720), (1600,1600)]
files = ['a.jpg','b.jpg','c.jpg']

pool = Pool(number_of_cores_to_use)
results = pool.map(thumbnail, zip(sizes, files))

The core of the code is exactly the same as @JakobBowyer, but instead of running it in a loop in a single thread, we wrapped it in a function spread it out across multiple cores via the multiprocessing map function.

idbrii
  • 10,975
  • 5
  • 66
  • 107
Audionautics
  • 530
  • 4
  • 12
  • 2
    Don't you want a Cartesian product rather than `zip` though? – Mechanical snail Dec 17 '13 at 13:17
  • The `zip` refers to [this function](http://docs.python.org/2.7/library/functions.html#zip), not the compressed file format. – Nick Mar 21 '14 at 08:15
  • does this above script add any benefit if numbers of cores is set to one? – avi Jul 19 '14 at 15:38
  • 1
    @Nick lookup Cartesian product. This will generate the thumbnail for the first image at 120x120, for the second at 720x720, and for the last image at 1600x1600 – The Tahaan Dec 06 '16 at 06:24
4

Another option is to use the python bindings to OpenCV. This may be faster than PIL or Imagemagick.

import cv2

sizes = [(120, 120), (720, 720), (1600, 1600)]
image = cv2.imread("input.jpg")
for size in sizes:
    resized_image = cv2.resize(image, size)
    cv2.imwrite("thumbnail_%d.jpg" % size[0], resized_image) 

There's a more complete walkthrough here.

If you want to run it in parallel, use concurrent.futures on Py3 or the futures package on Py2.7:

import concurrent.futures
import cv2

def resize(input_filename, size):
    image = cv2.imread(input_filename)
    resized_image = cv2.resize(image, size)
    cv2.imwrite("thumbnail_%s%d.jpg" % (input_filename.split('.')[0], size[0]), resized_image)

executor = concurrent.futures.ThreadPoolExecutor(max_workers=3)
sizes = [(120, 120), (720, 720), (1600, 1600)]
for size in sizes:
    executor.submit(resize, "input.jpg", size)
bcattle
  • 12,115
  • 6
  • 62
  • 82
4

One more answer, since (I think?) no one has mentioned quality.

Here's a photo I took with an iPhone 6S at the Olympic park in East London:

roof of olympic swimming pool

The roof is made from a set of wooden slats and unless you downsize rather carefully you'll get very nasty Moire effects. I had to compress the image quite heavily to upload to stackoverflow --- if you're interested, the original is here.

Here's cv2 resize:

$ python3
Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> x = cv2.imread("IMG_1869.JPG")
>>> y = cv2.resize(x, (120, 90))
>>> cv2.imwrite("cv2.png", y)
True

Here's vipsthumbnail:

$ vipsthumbnail IMG_1869.JPG -s 120 -o vips.png

And here are the two downsized images side-by-side and zoomed by x2, with vipsthumbnail on the left:

results of downsize to 120 pixels across

(ImageMagick gives the same results as vipsthumbnail)

cv2 is defaults to BILINEAR, so it has a fixed 2x2 mask. For every point in the output image, it calculates the corresponding point in the input and takes the 2x2 average. This means it's really only sampling at most 240 points in each line, and simply ignoring the other 3750! This produces ugly aliasing.

vipsthumbnail is doing a a more complex three stage downsize.

  1. It uses the libjpeg shrink-on-load feature to shrink the image by a factor of 8 in each axis with a box filter to turn the 4032 pixel across image to 504 x 378 pixels.
  2. It does a further 2 x 2 box filter shrink to get 252 x 189 pixels.
  3. It finishes with a 5 x 5 Lanczos3 kernel to get the output 120 x 90 pixel image.

This is supposed to give equivalent quality to a full Lanczos3 kernel, but be quicker because it can box filter most of the way.

jcupitt
  • 10,213
  • 2
  • 23
  • 39
  • 1
    Yes, `cv2.resize` with the default parameters does a terrible job when shrinking by a significant amount. This is a failing shared by many other imaging libraries, they neglect to adjust the resampling kernel properly. But you need not use the defaults, you can add `interpolation=cv2.INTER_AREA` and get results comparable to `vipsthumbnail`. – Mark Ransom Nov 15 '22 at 19:47
3

If you are already familiar with imagemagick, why not stick with the python-bindings?

PythonMagick

Don Question
  • 11,227
  • 5
  • 36
  • 54
  • Thanks -- is this faster than some of the builtin Python methods? – ensnare Dec 25 '11 at 20:19
  • 1
    Which builtin methods? If you mean PIL, i can't say for sure, but ImageMagick is more the swiss-army-knife then a racing horse. Nevertheless i could never complain about the performance, but just relish the incredible features. I don't know about any other library with similar capabilities. – Don Question Dec 25 '11 at 22:31
2

Python 2.7, Windows, x64 users

In addition to @JakobBowyer & @Audionautics, PIL is quite old and you can find yourself troubleshooting and looking for the right version... instead, use Pillow from here (source)

the updated snippet will look like this:

im = Image.open(full_path)
im.thumbnail(thumbnail_size)
im.save(new_path, "JPEG")

full enumeration script for thumbnail creation:

import os
from PIL import Image

output_dir = '.\\output'
thumbnail_size = (200,200)

if not os.path.exists(output_dir):
    os.makedirs(output_dir)

for dirpath, dnames, fnames in os.walk(".\\input"):
    for f in fnames:
        full_path = os.path.join(dirpath, f)
        if f.endswith(".jpg"):
            filename = 'thubmnail_{0}'.format(f) 
            new_path = os.path.join(output_dir, filename)
            
            if os.path.exists(new_path):
                os.remove(new_path)
                
            im = Image.open(full_path)
            im.thumbnail(thumbnail_size)
            im.save(new_path, "JPEG")
Community
  • 1
  • 1
Jossef Harush Kadouri
  • 32,361
  • 10
  • 130
  • 129
0

I stumbled upon this when trying to figure out which library I should use:

It seems like OpenCV is clearly faster than PIL.

That said, I'm working with spreadsheets and it turns out that the module I was using openpyxl already requires me to import PIL to insert images.

virtualxtc
  • 390
  • 4
  • 21
  • It's quite easy to convert an image from OpenCV to PIL/Pillow, see [Convert opencv image format to PIL image format?](https://stackoverflow.com/q/43232813/5987) – Mark Ransom Nov 14 '22 at 17:48