1

I have a huge dataset of images like this:

images

I would like to change the colors on these. All white should stay white, all purple should turn white and everything else should turn black. The desired output would look like this:

output image

I've made the code underneath and it is doing what I want, but it takes way to long to go through the amount of pictures I have. Is there another and faster way of doing this?

path = r"C:path"
for f in os.listdir(path):
f_name = (os.path.join(path,f))
if f_name.endswith(".png"):
    im = Image.open(f_name)
    fn, fext = os.path.splitext(f_name)
    print (fn)
    im =im.convert("RGBA")
    for x in range(im.size[0]):
        for y in range(im.size[1]):
            if im.getpixel((x, y)) == (255, 255, 255, 255):
                im.putpixel((x, y),(255, 255, 255,255))
            elif im.getpixel((x, y)) == (128, 64, 128, 255):
                im.putpixel((x, y),(255, 255, 255,255))
            else:
                im.putpixel((x, y),(0, 0, 0,255))

    im.show()
HansHirse
  • 18,010
  • 10
  • 38
  • 67

2 Answers2

5

Your images seem to be palettised as they represent segmentations, or labelled classes and there are typically fewer than 256 classes. As such, each pixel is just a label (or class number) and the actual colours are looked up in a 256-element table, i.e. the palette.

Have a look here if you are unfamiliar with palletised images.

So, you don't need to iterate over all 12 million pixels, you can instead just iterate over the palette which is only 256 elements long...

#!/usr/bin/env python3

import sys
import numpy as np
from PIL import Image

# Load image
im = Image.open('image.png')

# Check it is palettised as expected
if im.mode != 'P':
    sys.exit("ERROR: Was expecting a palettised image")

# Get palette and make into Numpy array of 256 entries of 3 RGB colours
palette = np.array(im.getpalette(),dtype=np.uint8).reshape((256,3))

# Name our colours for readability
purple = [128,64,128]
white  = [255,255,255]
black  = [0,0,0]

# Go through palette, setting purple to white
palette[np.all(palette==purple, axis=-1)] = white

# Go through palette, setting anything not white to black
palette[~np.all(palette==white, axis=-1)] = black

# Apply our modified palette and save
im.putpalette(palette.ravel().tolist())
im.save('result.png')

That takes 290ms including loading and saving the image.


If you have many thousands of images to do, and you are on a decent OS, you can use GNU Parallel. Change the above code to accept a command-line parameter which is the name of the image, and save it as recolour.py then use:

parallel ./recolour.py {} ::: *.png

It will keep all CPU cores on your CPU busy till they are all processed.

Keywords: Image processing, Python, Numpy, PIL, Pillow, palette, getpalette, putpalette, classes, classification, label, labels, labelled image.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
1

If you're open to use NumPy, you can heavily speed-up pixel manipulations:

from PIL import Image
import numpy as np

# Open PIL image
im = Image.open('path/to/your/image.png').convert('RGBA')

# Convert to NumPy array
pixels = np.array(im)

# Get logical indices of all white and purple pixels
idx_white = (pixels == (255, 255, 255, 255)).all(axis=2)
idx_purple = (pixels == (128, 64, 128, 255)).all(axis=2)

# Generate black image; set alpha channel to 255
out = np.zeros(pixels.shape, np.uint8)
out[:, :, 3] = 255

# Set white and purple pixels to white
out[idx_white | idx_purple] = (255, 255, 255, 255)

# Convert back to PIL image
im = Image.fromarray(out)

That code generates the desired output, and takes around 1 second on my machine, whereas your loop code needs 33 seconds.

Hope that helps!

HansHirse
  • 18,010
  • 10
  • 38
  • 67