How to map function directly over list of lists?

Question

I have built a pixel classifier for images, and for each pixel in the image, I want to define to which pre-defined color cluster it belongs. It works, but at some 5 minutes per image, I think I am doing something unpythonic that can for sure be optimized.

How can we map the function directly over the list of lists?

#First I convert my image to a list
#Below list represents a true image size
list1=[[255, 114, 70],
[120, 89, 15],
[247, 190, 6],
[41, 38, 37],
[102, 102, 10],
[255,255,255]]*3583180

Then we define the clusters to map the colors to and the function to do so (which is taken from the PIL library)

#Define colors of interest
#Colors of interest 
RED=[255, 114, 70]
DARK_YELLOW=[120, 89, 15]
LIGHT_YELLOW=[247, 190, 6]
BLACK=[41, 38, 37]
GREY=[102, 102, 10]
WHITE=[255,255,255]

Colors=[RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE]

#Function to find closes cluster by root and squareroot distance of RGB
def distance(c1, c2):
    (r1,g1,b1) = c1
    (r2,g2,b2) = c2
    return math.sqrt((r1 - r2)**2 + (g1 - g2) ** 2 + (b1 - b2) **2)

What remains is to match every color, and make a new list with matched indexes from the original Colors:

Filt_lab=[]

#Match colors and make new list with indexed colors
for pixel in tqdm(list1):
    closest_colors = sorted(Colors, key=lambda color: distance(color, pixel))
    closest_color = closest_colors[0]

    for num, clust in enumerate(Colors):
        if list(clust) == list(closest_color):
            Filt_lab.append(num)

Running a single image takes approximately 5 minutes, which is OK, but likely there is a method in which this time can be greatly reduced?

36%|███▌ | 7691707/21499080 [01:50<03:18, 69721.86it/s]

Expected outcome of Filt_lab:

[0, 1, 2, 4, 3, 5]*3583180

Not an expert in high performance computing, nonetheless, my hunch is : drop the list of lists. This should be flat arrays. Your image should be `[r1, g1, b1, r2, g2, b2, ...]` or 3 arrays `[r1, r2, ...], [g1, g2, ...], [b1, b2, ...]` or a mux (use the first bits of an int for r, the next 8 for g...). Then you should NOT compute diffrences pixel per pixel in a for loop because you are then using highly non consecutive memory regions. For each class, calculate the diff over the red arrays, then green, then blue. Then sum the diffs, then classify. And use numpy to make it all vectorized. — GPI, Aug 04 '21 at 13:49
Thanks. I now have per color an array with distance values. Only thing to figure out now is how to find from these 10 arrays per pixel the array with the lowest value and map it to the dictionary? — Rivered, Aug 04 '21 at 15:46
For questions about improving code (as opposed to fixing issues), CodeReview.SE is the appropriate site. SO is for non-working code. — outis, Oct 14 '21 at 17:33
Please note that newbedev is a Stack Overflow scraper; please don't link to it. Instead, google the text (optionally with `site:stackoverflow.com`) and find the correct on-site link, instead of giving scrapers more traffic that they don't deserve. — Zoe, Dec 26 '21 at 15:38

score 10 · Accepted Answer · answered Aug 04 '21 at 21:59

You can use the Numba's JIT to speed up the code by a large margin. The idea is to build classified_pixels on the fly by iterating over the colours for each pixel. The colours are stored in a Numpy array where the index is the colour key. The whole computation can run in parallel. This avoid many temporary arrays to be created and written/read in memory and a lot of memory to be allocated. Moreover, the data types can be adapted so that the resulting array is smaller in memory (so written/read faster). Here is the final script:

import numpy as np
import numba as nb

@nb.njit('int32[:,::1](int32[:,:,::1], int32[:,::1])', parallel=True)
def classify(image, colors):
    classified_pixels = np.empty((image.shape[0], image.shape[1]), dtype=np.int32)
    for i in nb.prange(image.shape[0]):
        for j in range(image.shape[1]):
            minId = -1
            minValue = 256*256 # The initial value is the maximum possible value
            ir, ig, ib = image[i, j]
            # Find the color index with the minimum difference
            for k in range(len(colors)):
                cr, cg, cb = colors[k]
                total = (ir-cr)**2 + (ig-cg)**2 + (ib-cb)**2
                if total < minValue:
                    minValue = total
                    minId = k
            classified_pixels[i, j] = minId
    return classified_pixels

# Representative image
np.random.seed(42)
imarray = np.random.rand(3650,2000,3) * 255
image = imarray.astype(np.int32)

# Colors of interest
RED = [255, 0, 0]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]

# Build a Numpy array rather than a dict
colors = np.array([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE], dtype=np.int32)

# Actual classification
classified_pixels = classify(image, colors)

# Convert array to list
cl_pixel_list = classified_pixels.reshape(classified_pixels.shape[0] * classified_pixels.shape[1]).tolist()

# Print
print(cl_pixel_list[0:10])

This implementation takes about 0.19 second on my 6-core machine. It is about 15 times faster than the last provided answer so far and more than thousand times faster than the initial implementation. Note that about half the time is spent in tolist() since classify function is very fast.

I had a hard time installing numba on my non supporting linux distribution. When I run your script it tells me: **NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e** . Also the output of **cl_pixel_list[0:10]** is **[5, 1, 3, 1, 0, 4, 1, 4, 3, 3]** which is different from **[4, 1, 4, 3, 3, 4, 0, 2, 4, 4]** — Rivered, Aug 11 '21 at 14:29
Numba supports [multiple threading backends](https://numba.readthedocs.io/en/stable/user/threading-layer.html). I think using OpenMP is better because it tends to be widely supported and efficient. So, if you get issues with TBB, consider using OpenMP. If you get issue with OpenMP, then there is a default embedded threading runtime. If you still have issues with all the other backends, then you can just remove `parallel=True` and replace the `prange` with `range` resulting in a sequential implementation. The resulting code should still be much faster than the current alternative solutions. — Jérôme Richard, Aug 11 '21 at 20:34
I checked the code and I got the same results than your last answer *with the same seed* (posted the 4 august). Note that I manually set the seed to 42 to get reproducible results. So if you want to compare the result of this code with another one, you need to set the seed (to the same value) or use the *same input values* for both codes. — Jérôme Richard, Aug 11 '21 at 20:45
Thanks, in any case your answer is well though and fastest ! — Rivered, Aug 13 '21 at 10:14

Koen G. · Answer 2 · 2021-08-06T12:52:04.597

Using numpy:

import numpy as np

#Representative image
imarray = np.uint64(np.random.rand(3583180*6,3) * 255)
#Or make with np.uint64(your_list_of_lists) if you already have that list lists; Axes: pixel, color_channels

RED=[255, 114, 70]
DARK_YELLOW=[120, 89, 15]
LIGHT_YELLOW=[247, 190, 6]
BLACK=[41, 38, 37]
GREY=[102, 102, 10]
WHITE=[255,255,255]
#your list of colors
Colors=[RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE]
#again converted to numpy
Colors_np = np.uint64(Colors) #axes: colors, color_channels

#Compute all distance, or rather the squares at that has no
#effect on which is minimal and we can drop the sqrt computation then
#Extend both numpy arrays to be haves axes [pixel, color, color_channels] with `np.newaxis`, 
#take the difference, 
#then the square, 
#and then the sum across color channels
distances = np.sum((imarray[:,np.newaxis, :] - Colors_np[np.newaxis, :, :])**2, 2)
#difference has axes [pixel, color, color_channels], summed over axes 2 => [pixel, color] axes remain
#You want index of minimum over color axis, so:
closest_color_indices = np.argmin(distances, 1)

#written as one line and timed with %timeit in ipython (on a single core):
#%timeit np.argmin(np.sum((imarray[:,np.newaxis, :] - Colors_np[np.newaxis, :, :])**2, 2), 1)
#6.11 s +- 79.4 ms per loop (mean +- std. dev. of 7 runs, 1 loop each)

So this takes about 6.11s for 3583180*6=21499080 pixels and 6 possible colors.

score 3 · Answer 3 · answered Aug 09 '21 at 16:21

it seems your computer is so fast :)

this is your code's halfway output on my system:

  0%|          | 5635/21499080 [00:44<46:51:14, 127.43it/s]

but I have rewritten your code using TensorFlow, and now it's running for about 3 seconds :)

import math
import os
from time import time

import numpy as np
from tqdm import tqdm

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'  # or any {'0', '1', '2'}
import tensorflow as tf

list1 = [[255, 114, 70],
         [120, 89, 15],
         [247, 190, 6],
         [41, 38, 37],
         [102, 102, 10],
         [255, 255, 255]] * 3583180
list1_np = tf.constant(list1)
RED = [255, 114, 70]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]
Colors = tf.constant([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE])
t = time()
ans = tf.argmin(np.array([tf.math.reduce_sum((list1_np - c) ** 2, axis=1) for c in Colors]), axis=0)
print(time() - t)
print(ans)


# and now your code

def distance(c1, c2):
    (r1, g1, b1) = c1
    (r2, g2, b2) = c2
    return math.sqrt((r1 - r2) ** 2 + (g1 - g2) ** 2 + (b1 - b2) ** 2)


t = time()
Filt_lab = []

# Match colors and make new list with indexed colors
for pixel in tqdm(list1):
    closest_colors = sorted(Colors, key=lambda color: distance(color, pixel))
    closest_color = closest_colors[0]

    for num, clust in enumerate(Colors):
        if list(clust) == list(closest_color):
            Filt_lab.append(num)
print(time() - t)

the output is:

3.1714584827423096
tf.Tensor([0 1 2 ... 4 3 5], shape=(21499080,), dtype=int64)
  0%|          | 951/21499080 [00:07<47:36:50, 125.42it/s]

NOTE1: you can omit some of the imports if you delete the second part.

NOTE2: when you want to compare distances together, there is no need to use square root.

score 1 · Answer 4 · answered Jul 23 '21 at 07:41

Just quick speedups:

You can omit math.sqrt()
Create dictionary of colors instead of a list (that way you don't have to search for the index each iteration)
use min() instead of sorted()

from tqdm import tqdm

list1 = [
    [255, 114, 70],
    [120, 89, 15], 
    [247, 190, 6],
    [41, 38, 37],
    [102, 102, 10],
    [255, 255, 255],
] * 3583180


RED = [255, 0, 0]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]

# create a dictionary instead of a list:
Colors = {
    i: c
    for i, c in enumerate([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE])
}


# Function to find closes cluster by root and squareroot distance of RGB - EDIT: squareroot omitted 
def distance(c1, c2):
    (r1, g1, b1) = c1
    (r2, g2, b2) = c2
    return (r1 - r2) ** 2 + (g1 - g2) ** 2 + (b1 - b2) ** 2   # <-- you can ommit math.sqrt


Filt_lab = []

# Match colors and make new list with indexed colors
for pixel in tqdm(list1):
    # use min() instead of sorted:
    closest_color = min(
        Colors, key=lambda color: distance(Colors[color], pixel)
    )
    Filt_lab.append(closest_color)

On my computer the speed went up from ~108000.0it/s to ~155000.00it/s.

Note: For this kind of tasks is better using numpy library.

Thanks, I went up from 60k iterations/s to 80k, will see if there is anything left to improve. — Rivered, Jul 23 '21 at 22:28

score 1 · Answer 5 · answered Aug 09 '21 at 10:28

You can try creating and using a lookup table with 256 * 256 * 256 elements.

import numpy as np
from scipy.spatial import cKDTree
imarray = np.uint8(np.random.rand(3583180*6,3) * 255)

code=np.array([1, 256, 256*256])
RED=[255, 114, 70]
DARK_YELLOW=[120, 89, 15]
LIGHT_YELLOW=[247, 190, 6]
GREY=[102, 102, 10]
BLACK=[41, 38, 37]
WHITE=[255,255,255]
#your list of colors
Colors=[RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE]
#again converted to numpy
Colors_np = np.uint8(Colors) #axes: colors, color_channels

x=np.arange(256, dtype=np.uint8)
rgb=np.array(np.meshgrid(x, x, x)).T.reshape(-1,3)
rgb[:,[0, 1, 2]]=rgb[:,[1, 0, 2]] #swap columns
# rgb is table all colors

voronoi_kdtree = cKDTree(Colors_np) # Voronoi by base Colors

_, test_point_regions = voronoi_kdtree.query(rgb)
# test_point_regions is lookup table (LUT)

result=test_point_regions[np.dot(imarray, code)]

assert np.all(test_point_regions[np.dot(Colors_np, code)]==np.array([0, 1, 2, 3, 4, 5]))

score 0 · Answer 6 · answered Jul 24 '21 at 09:45

I applied list comprehension, having the impression this should have been much faster, but in the end it was even slightly slower:

from tqdm import tqdm

#Representative image
list1 = [[255, 114, 70],
        [120, 89, 15], 
        [247, 190, 6],
        [41, 38, 37],
        [102, 102, 10],
        [255, 255, 255],] * 3583180

#Colors of interest
RED = [255, 0, 0]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]

# create a dictionary instead of a list:
Colors = {i: c for i, c in enumerate([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE])}


# Function to find closes cluster by root and squareroot distance of RGB - EDIT: squareroot omitted 
def distance(c1, c2):
    (r1, g1, b1) = c1
    (r2, g2, b2) = c2
    return (r1 - r2) ** 2 + (g1 - g2) ** 2 + (b1 - b2) ** 2   

#Attempt with list comprehension
Filt_lab = tqdm([(min(Colors, key=lambda color: distance(Colors[color], x))) for x in tqdm(list1)])

Before list comprehension: 59%|█████▉ | 12738323/21499080 [01:49<01:15, 116309.97it/s]

After list comprehension 100%|██████████| 21499080/21499080 [03:10<00:00, 112848.76it/s]

score 0 · Answer 7 · answered Aug 04 '21 at 17:36

Basically I took @GPI his comments to heart: "Not an expert in high performance computing, nonetheless, my hunch is : drop the list of lists. This should be flat arrays. Your image should be [r1, g1, b1, r2, g2, b2, ...] or 3 arrays [r1, r2, ...], [g1, g2, ...], [b1, b2, ...] or a mux (use the first bits of an int for r, the next 8 for g...). Then you should NOT compute diffrences pixel per pixel in a for loop because you are then using highly non consecutive memory regions. For each class, calculate the diff over the red arrays, then green, then blue. Then sum the diffs, then classify. And use numpy to make it all vectorized. – GPI"

I have completely redrawn the script and leave the image as is, not reshaping it into a list first. Only numpy operations are applied, and a full image takes a couple of seconds now to process (compared to 5 minutes earlier). Probably some small optimization steps can still be done.

from tqdm import tqdm
import numpy as np

#Representative image
imarray = np.random.rand(3650,2000,3) * 255
image=imarray.astype(np.uint64)

#Colors of interest
RED = [255, 0, 0]
DARK_YELLOW = [120, 89, 15]
LIGHT_YELLOW = [247, 190, 6]
BLACK = [41, 38, 37]
GREY = [102, 102, 10]
WHITE = [255, 255, 255]

# create a dictionary instead of a list:
Colors = {i: c for i, c in enumerate([RED, DARK_YELLOW, LIGHT_YELLOW, GREY, BLACK, WHITE])}

#Create a dictionary which will be filled with key of color and value of cumulative array with differences for every color
img_dictionary={}
for key, value in Colors.items():
    R=(image[:,:,0].astype(np.uint64)-value[0])**2
    G=(image[:,:,1].astype(np.uint64)-value[1])**2
    B=(image[:,:,2].astype(np.uint64)-value[2])**2
    Total=R+G+B
    img_dictionary[key] = Total

#Stack all arrays from dictionary
arr = np.stack(list(img_dictionary.values()))

#Find array with lowest difference and map it to new array
classified_pixels=np.argmin(arr, axis=0)

#Convert array to list
cl_pixel_list = classified_pixels.reshape((classified_pixels.shape[0] * classified_pixels.shape[1])).tolist()

#Print
print(cl_pixel_list[0:10])
>>>[4, 1, 4, 3, 3, 4, 0, 2, 4, 4]

Two possible additions : are you sure you need uint64 ? The way I see it, you have discrete int values from 0 to 255. A single byte would suffice `np.ubyte`, you'd get *much* faster computation. I would also compare the speed of your `R` / `G`/ `B` arrays calculation : as is, you remove a constant for each array. Instead, I would try making each reference color into an array the same size as your image (e.g. a fully RED or DARK_YELLOW) image, and operate the diff via numpy directly (e.g. subtract one stack from the other). Not sure this helps, though, but worth a try. — GPI, Aug 04 '21 at 21:52

How to map function directly over list of lists?

7 Answers7