4

I have some code that does (independent) operations on a bunch of Python Imaging Library (PIL) Image objects. I would like to try and speed this up using parallel processing, so I read up on the multiprocessing module below:

http://docs.python.org/library/multiprocessing.html

But it's still not very clear to me how to use multiprocessing for this problem.

Conceptually, it looks like I could use a multiprocessing.Queue of Image objects and use a Pool of workers. But the Image objects seem 'unpickelable':

UnpickleableError: Cannot pickle <type 'ImagingCore'> objects

Is there a better way to process PIL images in parallel?

nbro
  • 15,395
  • 32
  • 113
  • 196
M-V
  • 5,167
  • 7
  • 52
  • 55

3 Answers3

3

If you get the image objects from files, you can just send the filenames to the workers and let them open the images themselves.

Otherwise, you can send the image data (with Image.getdata()), along with the size and pixel format, and have the workers reconstruct the image using Image.new() and Image.putdata().

nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • Thanks. Am I on the right track to use a Queue and a pool of workers? – M-V Sep 16 '12 at 06:19
  • Sounds like it. `multiprocessing` is the module to go for if you want to do this sort of CPU-bound/memory-bound parallel processing with Python. – nneonneo Sep 16 '12 at 06:21
  • @M-V: If you use a Queue to feed image data to the workers, loading all images in the parent process might become a bottleneck. And you'd have additional overhead with the pickling/unpickling of the data. If you just feed the filenames to the workers, they can do the image loading in parallel too. – Roland Smith Sep 16 '12 at 11:20
  • Thanks nneonneo and Roland for the idea of using filenames. – M-V Sep 16 '12 at 12:04
2

Just put the names of the files in a list, and let the worker processes handle them. The example below uses ImageMagick in a subprocess to do some image conversion from an obscure format. But the same principle can be used with PIL. Just replace the contents of the processfile() function. This is a program that I use frequently to convert DICOM files (a format used in medical imaging, from an X-ray machine in this case) to PNG format.

"""Convert DICOM files to PNG format, remove blank areas. The blank erea
   removal is based on the image size of a Philips flat detector. The image
   goes from 2048x2048 pixels to 1574x2048 pixels."""

import os
import sys
import subprocess
from multiprocessing import Pool, Lock

globallock = Lock()

def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- string or list of strings containing a command to test
    """
    if isinstance(args, str):
        args = args.split()
    try:
        f = open('/dev/null')
        subprocess.call(args, stderr=subprocess.STDOUT, stdout=f)
        f.close()
    except:
        print "Required program '{}' not found! exiting.".format(args[0])
        sys.exit(1)

def processfile(fname):
    """Use the convert(1) program from the ImageMagick suite to convert the
       image and crop it.

    Arguments:
    fname -- string containing the name of the file to process
    """
    size = '1574x2048'
    args = ['convert', fname, '-units', 'PixelsPerInch', '-density', '300',
            '-crop', size+'+232+0', '-page', size+'+0+0', fname+'.png']
    rv = subprocess.call(args)
    globallock.acquire()
    if rv != 0:
        print "Error '{}' when processing file '{}'.".format(rv, fname)
    else:
        print "File '{}' processed.".format(fname)
    globallock.release()

def main(argv):
    """Main program.

    Arguments:
    argv -- command line arguments
    """
    if len(argv) == 1:
        # If no filenames are given, print a usage message.
        path, binary = os.path.split(argv[0])
        print "Usage: {} [file ...]".format(binary)
        sys.exit(0)
    # Verify that the convert program that we need is available.
    checkfor('convert')
    # Apply the processfile() function to all files in parallel.
    p = Pool()
    p.map(processfile, argv[1:])
    p.close()

if __name__ == '__main__':
    main(sys.argv)
Roland Smith
  • 42,427
  • 3
  • 64
  • 94
1

Another option is to convert the PIL images into numpy arrays, which are pickleable.

from __future__ import print_function
import numpy
import pickle

my_array = numpy.array([1,2,3])
pickled_array = pickle.dumps(my_array)
print('The pickled version of %s is:\n\n%s.' % (my_array, pickled_array))
Community
  • 1
  • 1
mkohler
  • 11
  • 1