1

I have a large package of .jpg images of the sky, some of which are artificially white; these images are set to (255, 255, 255) for every pixel. I need to pick these images out. I do so by only looking at the first 5 pixels. my code is :

im = Image.open(imagepath)
imList = list(im.getdata())[:5]
if imList = [(255, 255, 255), (255, 255, 255), (255, 255, 255), (255, 255, 255), (255, 255, 255)]:
    return True

However, this process takes a large amount of time because im.getdata() returns the whole image, is there a different function I can use to return less data, or perhaps specific pixels? I need to look at multiple pixels because other images may have one or two pixels that are completely white, so I look at 5 pixels in order to not get false positives.

Craig T
  • 113
  • 6
  • 1
    The pillow fork of PIL has a [`PixelAccess`](http://pillow.readthedocs.io/en/4.1.x/reference/PixelAccess.html?highlight=getpixel#pixelaccess-class) class. The image's data will still have to be loaded, but there may be less overhead than with using `getdata()` because it doesn't have to convert the contents of the image into a sequence object. – martineau May 22 '17 at 20:18
  • Possible duplicate of [Get pixel's RGB using PIL](https://stackoverflow.com/questions/11064786/get-pixels-rgb-using-pil) – Mad Physicist May 22 '17 at 20:19
  • I had not found the image.load() function yet, is it significantly faster than image.getdata()? – Craig T May 22 '17 at 20:29
  • In your case, it will be a little faster to call `load`, but I suspect that it will only be by a very small margin. – Mad Physicist May 22 '17 at 20:31
  • BTW, PIL or Pillow? – Mad Physicist May 22 '17 at 20:31
  • Relevant (possibly even duplicate): https://stackoverflow.com/q/19695249/2988730 – Mad Physicist May 22 '17 at 20:47

1 Answers1

0

You can use the Image.getpixel method:

im = Image.open(imagepath)
if all(im.getpixel((0, x)) == (255, 255, 255) for x in range(5)):
    # Image is saturated

This assumes that your image has at least five pixels in each row.

Normally, accessing individual pixels is much slower than loading the whole image and messing around with a PixelAccess object. However, for the tiny fraction of the image you are using, you will probably lose a lot of time loading the entire thing.

You may be able to speed things up by calling load on a sub-image returned lazily by crop:

im = Image.open(imagepath).crop((0, 0, 5, 1)).load()
if all(x == (255, 255, 255) for x in im):
    # Image is saturated
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • This does not work as written, im.getpixel(0, x) should be im.getpixel((0, x)). That said, i think this is the correct answer. – Craig T May 22 '17 at 20:35
  • @CraigT. You are right. The main thing is that you want to *avoid* calling `load` because that will read in the entire image. You really want just a tiny subset, which can be done with a tiny read from the disk rather than reading in megabytes of data. – Mad Physicist May 22 '17 at 20:38
  • @MadPhysicist I think the way PIL works is through lazy loading, it won't load *any* of the image until you ask for a pixel then it will load the *entire* image. `load` won't be slower. – Mark Ransom May 22 '17 at 20:42
  • @MarkRansom. I have that suspicion too, but it is very unfortunate. Give that `crop` is "lazy", I think it may be the only valid option here. – Mad Physicist May 22 '17 at 20:44
  • Again, I don't think `crop` will be any faster. It would be worthwhile for somebody to do some timing, but I'm not anticipating any miracles. – Mark Ransom May 22 '17 at 20:47
  • so what you are implying in that last comment is that cropping the image and then reading pixels may be the only way to make this faster? Or would crop have to load the whole image as well, making it no faster at all? – Craig T May 22 '17 at 20:47
  • @CraigT. According to [this SO thread](https://stackoverflow.com/q/19695249/2988730), that appears to be the case. Unless you want to implement your own image loader. Keep in mind that cropping may not make things faster for compressed formats. – Mad Physicist May 22 '17 at 20:48