Can K-means be used to help in pixel-value based separation of an image?

Question

I'm trying to separate a greylevel image based on pixel-value: suppose pixels from 0 to 60 in one bin, 60-120 in another, 120-180 ... and so on til 255. The ranges are roughly equispaced in this case. However by using K-means clustering will it be possible to get more realistic measures of what my pixel value ranges should be? Trying to obtain similar pixels together and not waste bins where there is lower concentration of pixels present.

EDITS (to include obtained results): enter image description here

k-means with no of cluster = 5

That sounds an awful lot like histogram equalization... http://fourier.eng.hmc.edu/e161/lectures/contrast_transform/node3.html — Joe Kington, Mar 24 '11 at 02:20
as @Throwback1986 said it's not really that similar. I'm not trying to equalize a histogram but split the histogram most efficiently. — AruniRC, Mar 24 '11 at 09:07

Dr. belisarius · Accepted Answer · 2011-03-26T17:45:30.537

11

Of course K-Means can be used for color quantization. It's very handy for that.

Let's see an example in Mathematica:

We start with a greyscale (150x150) image:

enter image description here

Let's see how many grey levels are there when representing the image in 8 bits:

ac = ImageData[ImageTake[i, All, All], "Byte"];
First@Dimensions@Tally@Flatten@ac
-> 234

Ok. Let's reduce those 234 levels. Our first try will be to let the algorithm alone to determine how many clusters are there with the default configuration:

ic = ClusteringComponents[Image@ac];
First@Dimensions@Tally@Flatten@ic 
-> 3

It selects 3 clusters, and the corresponding image is:

enter image description here

Now, if that is ok, or you need more clusters, is up to you.

Let's suppose you decide that a more fine-grained color separation is needed. Let's ask for 6 clusters instead of 3:

ic2 = ClusteringComponents[Image@ac, 6];
Image@ic2 // ImageAdjust

Result:

enter image description here

and here are the pixel ranges used in each bin:

Table[{Min@#, Max@#} &@(Take[orig, {#[[1]]}, {#[[2]]}] & /@ 
    Position[clus, n]), {n, 1, 6}]
-> {{0, 11}, {12, 30}, {31, 52}, {53, 85}, {86, 134}, {135, 241}}

and the number of pixels in each bin:

Table[Count[Flatten@clus, i], {i, 6}]
-> {8906, 4400, 4261, 2850, 1363, 720}

So, the answer is YES, and it is straightforward.

Edit

Perhaps this will help you understand what you are doing wrong in your new example.

If I clusterize your color image, and use the cluster number to represent brightness, I get:

enter image description here

That's because the clusters are not being numbered in an ascending brightness order.

But if I calculate the mean brightness value for each cluster, and use it to represent the cluster value, I get:

enter image description here

In my previous example, that was not needed, but that was just luck :D (i.e. clusters were found in ascending brightness order)

edited Mar 26 '11 at 17:45

answered Mar 24 '11 at 03:36

Dr. belisarius

60,527
15
115
190

thanks for that detailed explanation, got the basic concept. I'll be doing it with OpenCV and VC++ though - any idea how straightforward that would be? – AruniRC Mar 24 '11 at 08:34
@Aruni I really don't know, sorry. Please note that I did the clustering in the one dimensional color space, not in the image space. It means that it is not useful to detect objects, but compress and simplifies the image A LOT. If you need to do spatial clustering, perhaps mean-shift is a good option too. – Dr. belisarius Mar 24 '11 at 08:51
lol i just ran it in OpenCV. After fishing about the documentation and the yahoo discussion group. works fine. thanks anyways. – AruniRC Mar 24 '11 at 09:08
@belisarius Updated. Having further questions - like why dark regions are clustered into light shades. but thats for another question altogether i guess. – AruniRC Mar 26 '11 at 12:31
@Aruni I'm pretty sure you're in the right path, and only a minor misunderstanding is preventing you complete the task. The problem with your image is that you are using something like the "cluster number" to represent the grayscale, while you should be using something like the mean value of the pixels in the cluster. Wait ... it seems you also forgot to convert to grayscale before clustering!. HTH! – Dr. belisarius Mar 26 '11 at 13:56
it helped. pity i don't know Mathematica - the programming language divide was causing the implementation confusion. BTw i had tried mean-shift filtering using image pyramids but edges got blurry: exactly the thing i am trying to avoid using k-means and other stuff – AruniRC Apr 02 '11 at 04:35
@Aruni K-Means is OK for what you are trying to do. Just get the mean value FOR EACH cluster as the cluster representative color, and you are done – Dr. belisarius Apr 02 '11 at 19:39
A fast kmeans like pixel clustering is described at this blog post: http://www.modejong.com/blog/post17_divquant_clustering – MoDJ Jun 21 '16 at 20:22

score 2 · Answer 2 · answered Mar 24 '11 at 03:18

k-means could be applied to your problem. If it were me, I would first try a basic approach borrowed from decision trees (although "simpler" is dependent upon your precise clustering algorithm!)

Assume one bin exists, begin stuffing the pixel intensities into the bin. When the bin is "full enough", compute the mean and standard deviation of the bin (or node). If the standard deviation is greater than some threshold, split the node in half. Continue this process until all intensities are done, and you will have a more efficient histogram.

This method can be improved with additional details of course:

You might consider using kurtosis as a splitting criteria.
Skewness might be used to determine where the split occurs
You might cross all the way into decision tree land and borrow the Jini index to guide splitting (some split techniques rely on more "exotic" statistics, like the t-test).
Lastly, you might perform a final consolidation pass to collapse any sparsely populated nodes.

Of course, if you've applied all of the above "improvements", then you've basically implemented one variation of a k-means clustering algorithm ;-)

Note: I disagree with the comment above - the problem you describe does not appear closely related histogram equalization.

true it's not histogram equalization - which has a one-line function call in OpenCV (and i will not be re-inventing the wheel when the actual project is text extraction from really blurred street scenes). — AruniRC, Mar 24 '11 at 08:36

Can K-means be used to help in pixel-value based separation of an image?

2 Answers2

Linked