23

I have a python image processing function, that uses tries to get the dominant color of an image. I make use of a function I found here https://github.com/tarikd/python-kmeans-dominant-colors/blob/master/utils.py

It works, but unfortunately I don't quite understand what it does and I learned that np.histogram is rather slow and I should use cv2.calcHist since it's 40x faster according to this: https://docs.opencv.org/trunk/d1/db7/tutorial_py_histogram_begins.html

I'd like to understand how I have to update the code to use cv2.calcHist, or better, which values I have to input.

My function

def centroid_histogram(clt):
    # grab the number of different clusters and create a histogram
    # based on the number of pixels assigned to each cluster
    num_labels = np.arange(0, len(np.unique(clt.labels_)) + 1)
    (hist, _) = np.histogram(clt.labels_, bins=num_labels)

    # normalize the histogram, such that it sums to one
    hist = hist.astype("float")
    hist /= hist.sum()

    # return the histogram
    return hist

The pprint of clt is this, not sure if this helps

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=1, n_init=10, n_jobs=1, precompute_distances='auto',
    random_state=None, tol=0.0001, verbose=0)

My code can be found here: https://github.com/primus852/python-movie-barcode

I am a very beginner, so any help is highly appreciated.

As per request:

Sample Image

Sample

Most dominant color:

rgb(22,28,37)

Computation time for the Histogram:

0.021515369415283203s

Divakar
  • 218,885
  • 19
  • 262
  • 358
PrimuS
  • 2,505
  • 6
  • 33
  • 66

3 Answers3

19

Two approaches using np.unique and np.bincount to get the most dominant color could be suggested. Also, in the linked page, it talks about bincount as a faster alternative, so that could be the way to go.

Approach #1

def unique_count_app(a):
    colors, count = np.unique(a.reshape(-1,a.shape[-1]), axis=0, return_counts=True)
    return colors[count.argmax()]

Approach #2

def bincount_app(a):
    a2D = a.reshape(-1,a.shape[-1])
    col_range = (256, 256, 256) # generically : a2D.max(0)+1
    a1D = np.ravel_multi_index(a2D.T, col_range)
    return np.unravel_index(np.bincount(a1D).argmax(), col_range)

Verification and timings on 1000 x 1000 color image in a dense range [0,9) for reproducible results -

In [28]: np.random.seed(0)
    ...: a = np.random.randint(0,9,(1000,1000,3))
    ...: 
    ...: print unique_count_app(a)
    ...: print bincount_app(a)
[4 7 2]
(4, 7, 2)

In [29]: %timeit unique_count_app(a)
1 loop, best of 3: 820 ms per loop

In [30]: %timeit bincount_app(a)
100 loops, best of 3: 11.7 ms per loop

Further boost

Further boost upon leveraging multi-core with numexpr module for large data -

import numexpr as ne

def bincount_numexpr_app(a):
    a2D = a.reshape(-1,a.shape[-1])
    col_range = (256, 256, 256) # generically : a2D.max(0)+1
    eval_params = {'a0':a2D[:,0],'a1':a2D[:,1],'a2':a2D[:,2],
                   's0':col_range[0],'s1':col_range[1]}
    a1D = ne.evaluate('a0*s0*s1+a1*s0+a2',eval_params)
    return np.unravel_index(np.bincount(a1D).argmax(), col_range)

Timings -

In [90]: np.random.seed(0)
    ...: a = np.random.randint(0,9,(1000,1000,3))

In [91]: %timeit unique_count_app(a)
    ...: %timeit bincount_app(a)
    ...: %timeit bincount_numexpr_app(a)
1 loop, best of 3: 843 ms per loop
100 loops, best of 3: 12 ms per loop
100 loops, best of 3: 8.94 ms per loop
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • That's so great and it's really fast. However, I cannot get the color from `.bincount_app` when I do `color = utils.bincount_app(image).astype('uint8').tolist()` it says `'tuple' object has no attribute 'astype'`. Same thing with `unique_count` works like a charm, but seems to be slower. – PrimuS Jun 17 '18 at 22:17
  • @PrimuS Simply do : `list(bincount_numexpr_app(a))`. – Divakar Jun 17 '18 at 22:19
  • Hm, sorry I feel useless, but `color = list(utils.bincount_numexpr_app(image))` and `cv2.rectangle(barcode, (0, 0), (width, height), color, -1)` leads to `Scalar value for argument 'color' is not numeric` – PrimuS Jun 17 '18 at 22:23
  • @PrimuS I am not sure about the expected input to color argument there. Mayb it expects a tuple. So, try : `color = utils.bincount_numexpr_app(image)` or even `color = tuple(utils.bincount_numexpr_app(image))`? – Divakar Jun 17 '18 at 22:27
  • @PrimuS Is `barcode` a grayscale image or a color one? – Divakar Jun 17 '18 at 22:30
  • Barcode Rect is this `barcode = np.zeros((height, width, 3), dtype="uint8")` and from the OpenCV docs, it expects `CvScalar color`... – PrimuS Jun 17 '18 at 22:33
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/173302/discussion-between-divakar-and-primus). – Divakar Jun 17 '18 at 22:35
12

@Divakar has given a great answer. But if you want to port your own code to OpenCV, then:

    img = cv2.imread('image.jpg',cv2.IMREAD_UNCHANGED)

    data = np.reshape(img, (-1,3))
    print(data.shape)
    data = np.float32(data)

    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
    flags = cv2.KMEANS_RANDOM_CENTERS
    compactness,labels,centers = cv2.kmeans(data,1,None,criteria,10,flags)

    print('Dominant color is: bgr({})'.format(centers[0].astype(np.int32)))

Result for your image:

Dominant color is: bgr([41 31 23])

Time it took: 0.10798478126525879 secs

zindarod
  • 6,328
  • 3
  • 30
  • 58
4

The equivalent code for cv2.calcHist() is to replace:

(hist, _) = np.histogram(clt.labels_, bins=num_labels)  

with

dmin, dmax, _, _ = cv2.minMaxLoc(clt.labels_)

if np.issubdtype(data.dtype, 'float'): dmax += np.finfo(data.dtype).eps
else: dmax += 1

hist = cv2.calcHist([clt.labels_], [0], None, [num_labels], [dmin, dmax]).flatten()

Note that cv2.calcHist only accepts uint8 and float32 as element type.

Update

It seems like opencv's and numpy's binning differs from each other as the histograms differ if the number of bins doesn't map the value range:

import numpy as np
from matplotlib import pyplot as plt
import cv2

#data = np.random.normal(128, 1, (100, 100)).astype('float32')
data = np.random.randint(0, 256, (100, 100), 'uint8')
BINS = 20

np_hist, _ = np.histogram(data, bins=BINS)

dmin, dmax, _, _ = cv2.minMaxLoc(data)
if np.issubdtype(data.dtype, 'float'): dmax += np.finfo(data.dtype).eps
else: dmax += 1

cv_hist = cv2.calcHist([data], [0], None, [BINS], [dmin, dmax]).flatten()

plt.plot(np_hist, '-', label='numpy')
plt.plot(cv_hist, '-', label='opencv')
plt.gcf().set_size_inches(15, 7)
plt.legend()
plt.show()
Timo
  • 9,269
  • 2
  • 28
  • 58