Context
I often found myself in the following situation:
- I have a list of image filenames I need to process
- I read each image sequentially using for instance scipy.misc.imread
- Then I do some kind of processing on each image and return a result
- I save the result along the image filename into a Shelf
The problem is that simply reading the image takes a non negligible amount of time, sometime comparable or even longer than the image processing.
Question
So I was thinking that ideally I could read image n + 1 while processing image n. Or even better processing and reading multiple images at once in an automagically determined optimal way ?
I have read about multiprocessing, threads, twisted, gevent and the like but I can't figure out which one to use and how to implement this idea. Does anyone have a solution to this kind of issue ?
Minimal example
# generate a list of images
scipy.misc.imsave("lena.png", scipy.misc.lena())
files = ['lena.png'] * 100
# a simple image processing task
def process_image(im, threshold=128):
label, n = scipy.ndimage.label(im > threshold)
return n
# my current main loop
for f in files:
im = scipy.misc.imread(f)
print process_image(im)