7

for a project I want to implement a color-clustering algorithm, which replace similar colors with the average color of a cluster.

For now, I use the kmeans-algorithm to cluster the whole image . But this take's a long time. Has someone an idea how to use kmeans to cluster a color-histogram , so I can perform this algorithm?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
501 - not implemented
  • 2,638
  • 4
  • 39
  • 74

3 Answers3

6

Downsample the image first, then run k-means.

If you resize the image to 1/2th in both x and y, it shouldn't affect colors much, but k-means should take at most 1/4th of the time. If you resample to 1/10 of the width and height, k-means should run 100 times faster.

https://en.wikipedia.org/wiki/Color_quantization

By downsampling the image, you have less "pixels" to process during clustering. But in the end, it should produce roughly the same color scheme.

Small summary of k-means:

  1. It maps each object (=pixel) to the nearest cluster center (= palette entry)
  2. It recomputes each palette entry to best represent the assigned points (= pixels)
  3. Repeat until nothing changes anymore.

So the real output is not an image or image regions. It's the palette.

You can then map an arbitrary image (including the full resolution version) to this color palette by simply replacing each pixel with the closest color!

Complexity and performance:

The complexity of k-means is O(n*k*i), where n is the number of pixels you have, k the desired number of output colors and i the number of iterations needed until convergence.

n: by downsampling, you can easily reduce n, the largest factor. In many situations, you can reduce this quite significantly before you see a degradation in performance.

k: this is your desired number of output colors. Whether you can reduce this or not depends on your actual use case.

i: various factors can have an effect on convergence (including both other factors!), but the strongest probably is having good starting values. So if you have a very fast but low quality method to choose the palette, run it first, then use k-means to refine this palette. Maybe OpenCV already includes an appropriate heuristic for this though!

You can see, the easiest approach is to reduce n. You can reduce n significantly, produce an optimized palette for the thumbnail, then rerun k-means on the full image refinining this palette. As - hopefully - this will reduce the number of iterations significantly, this can sometimes perform very well.

Community
  • 1
  • 1
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
1

My answer is not connected with histogram clusterization but recently I need to speedup clusterization procedure of my algorithm. For this I did the following:

  1. Resized image to smaller one (actually it was already suggested)
  2. Reduced number of colors (image quantization). I did it as suggested here: How to reduce the number of colors in an image with OpenCV?.

And this really helped me to speedup clusterization in some times. Also you can try to play around with OpenCV's mean-shift filtering.

Community
  • 1
  • 1
ArtemStorozhuk
  • 8,715
  • 4
  • 35
  • 53
  • Actually, he wants to use k-means for color quantization. Why would you first use quantization A to then run quantization B? Note that he is not clustering images, but colors. – Has QUIT--Anony-Mousse Nov 29 '12 at 14:58
  • @Anony-Mousse What do you mean by saying `quantization A` and `B`? – ArtemStorozhuk Nov 29 '12 at 16:14
  • see https://en.wikipedia.org/wiki/Color_quantization -- k-means can be used for color quantization, which is probably what he is trying to do. It does not make sense to use a different color quantization before, does it? – Has QUIT--Anony-Mousse Nov 29 '12 at 17:22
  • @Anony-Mousse It will improve speed - kmeans works faster on smaller number of colors. – ArtemStorozhuk Nov 29 '12 at 19:35
  • K-means complexity is `O(n k i)` where `n` is the number of pixels, `k` the number of clusters, and `i` is the number of iterations until convergence. Of course: if your pre-quantify the image a lot, k-means will likely need fewer iterations, but it will also degreade quality much more. Do you have any reference that says you should pre-quantify for k-means? – Has QUIT--Anony-Mousse Nov 30 '12 at 00:03
  • @Anony-Mousse sorry, I don't have it. Actually my supervisor advised me to do that. – ArtemStorozhuk Nov 30 '12 at 13:40
0

You need to assign a weight for each data, i.e. the number of values in the histogram bin. Then, when you compture the new value for cluster centroids, you use a weighted average instead of plain average. But the interface of OpenCV KMeans clustering does not support weighted values. YOu can use the C clustering library which does support it, is quite well documented (although takes examples from bioinformatics), and is easy to integrate (a single .h/.c file).

remi
  • 3,914
  • 1
  • 19
  • 37