Python performance, or bad usage of PIL?

Question

I am implementing basic global thresholding with Python. A part of the algorithm involves grouping pixels into two containers according to their intensities;

group_1 = []
group_2 = []
for intensity in list(image.getdata()):
    if intensity > threshold:
        group_1.append[]
    else:
        group_2.append[]

With images exceeding 0.5 megapixels, this approach typically uses about 5 seconds or more. In every possible approach I need to check every pixel, so I am wondering if there any faster way to do this (by using other methods from PIL, other data structures or algorithms?), or is it simply a Python performance issue?

Why are you making a list from the data? It's an iterable, so just iterate over it. — Gareth Latty, Apr 26 '13 at 23:51
you are right, I did not have to make it a list, but it did not perform any faster iterating over the data vs. the list of the data — Skogen, Apr 26 '13 at 23:55
Don't iterate over each pixel. There are probably better ways to do it in PIL, but you could just `sort()` the data (C speed) and split it on the threshold value. — Mark Tolonen, Apr 26 '13 at 23:56
What you're doing there is getting a list of all the pixels, but what happens to them then? You've lost all the spatial information... There is probably a way to get what you want in a couple steps using `ImageChops`. — kindall, Apr 27 '13 at 00:20
basic global thresholding is histogram-based, so I don't need any spatial information. — Skogen, Apr 27 '13 at 00:22
Then why not just make the histogram using `Image.histogram()` and slice it at the threshold value? That is, apply the threshold *after* making the histogram? — kindall, Apr 27 '13 at 15:51

score 2 · Accepted Answer · edited May 23 '17 at 11:49

If you are going to be working with a large amount of numerical information, you should read the image data into numpy and manipulate the array there. The routines will be faster (and simpler) than anything you can write in pure python.

See this question to get you started on reading and writing from PIL to numpy:

PIL and numpy

For example, if the image is in grayscale, the array of points will simply be a number from 0-255. To "threshold" you could simply do something like this:

group1 = A[A> threshold]
group2 = A[A<=threshold]

Python performance, or bad usage of PIL?

1 Answers1