4

I have a function that eat a couple of seconds. The function should return the Top(n) colors in a given Image. The return must be sorted, so i could work with the rgb values from the first, second, third top most color.

On the fist hand i had an PIL.Image Object that i looped over x,y coords and counted it in an defaultdict. I've replaced the PIL Object in my project with a Numpy array what gived me a big Boost but i don't know how to replace the defaultdict in this situation.

My current solution:

import numpy as np
from scipy import misc  # for example Image
from collections import defaultdict

def count_colors(img, n):
    img = img.reshape(-1, img.shape[-1])
    color = defaultdict(int)

    for pixel in img:
        rgb = (pixel[0], pixel[1], pixel[2])
        color[rgb] += 1

    sorted_color = sorted(color.items(), key=lambda k_v: k_v[1], reverse=True)
    sorted_color = sorted_color[:n]

    return sorted_color

img = misc.face()  # example Numpy Image array
top_colors = count_colors(img, n=5)

display(top_colors)

Current Output:

[((9, 9, 9), 1062),
 ((10, 10, 10), 700),
 ((8, 8, 8), 668),
 ((9, 7, 8), 586),
 ((9, 7, 10), 579)]

Is there a real Numpy way to solve this?

Incur
  • 75
  • 7

1 Answers1

2

Approach #1

We can use np.unique(.., axis=0, return_counts=True) to get counts for each unique color and then np.argpartition for the top N colors among them for a compact vectorized solution -

def topN_colors(img, N):
    unqc,C = np.unique(img.reshape(-1,img.shape[-1]), axis=0, return_counts=True)
    topNidx = np.argpartition(C,-N)[-N:]
    return unqc[topNidx], C[topNidx]

Approach #2

Another mainly based on a 24-bit integer 2D reduction for a hopefully more efficient solution -

# https://stackoverflow.com/a/57236217/ @tstanisl
def scalarize(x):
    # compute x[...,2]*65536+x[...,1]*256+x[...,0] in efficient way
    y = x[...,2].astype('u4')
    y <<= 8
    y +=x[...,1]
    y <<= 8
    y += x[...,0]
    return y

def topN_colors_v2(img, N):
    img2D = scalarize(img)
    unq,idx,C = np.unique(img2D, return_index=True, return_counts=True)
    topNidx = np.argpartition(C,-N)[-N:]
    return img.reshape(-1,img.shape[-1])[idx[topNidx]], C[topNidx]

Note that argpartition doesn't keep the order. To keep order, use range() with it. More info. So, in np.argpartition replace -N with range(-N,0) to get colors and their counts in ascending order. For descending order, simply flip the final outputs.

Verification with sample

# Sample setup
np.random.seed(0)
# some random set colors
colors = np.array([[2,5,6],[1,2,3],[6,7,8],[5,3,1],[7,4,2]])

# Random image with colors chosen off colors
idx = np.random.randint(0,len(colors),(50,40))
img = colors[idx]
img = img.astype(np.uint8)

# Given that we know the unique colors as `colors` and the indices
# use to get the image `img, let's "manually" compute the 
# top N=2 colors and their counts
In [45]: count = np.bincount(idx.ravel())

In [46]: colors[count.argsort()[-2:]]
Out[46]: 
array([[1, 2, 3],
       [5, 3, 1]], dtype=uint8)

In [47]: count[count.argsort()[-2:]]
Out[47]: array([393, 446])

# Verify our solution
In [48]: topN_colors(img, N=2)
Out[48]: 
(array([[1, 2, 3],
        [5, 3, 1]], dtype=uint8),
 array([393, 446]))
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Yes, it works and your second approach ist really fast. I need it really in descending order. Your range(-N,0) was the key. I modified the return to: return img.reshape(-1,img.shape[-1])[idx[topNidx]][::-1], C[topNidx][::-1] – Incur May 24 '20 at 21:47
  • tbh, i don't get the scalarize function. You never stop learning :) – Incur May 24 '20 at 21:52