5

Possible Duplicate:
How do I determine k when using k-means clustering?

How can i choose the K initially, if i do not know about the data?

Can someone help me in choosing the K.

Thanks Navin

Community
  • 1
  • 1
Navin
  • 411
  • 3
  • 9
  • 17
  • 1
    It's important to realize is that there isn't a fully principled way of doing clustering. Generally, you have to implicitly specify the density. For k-means you are specifying the density via the number of clusters. For mean-shift you have to choose the neighbourhood size. Even if you are using some criteria to choose the number of clusters or the neighbourhood size, you have still chosen to use that method. – YXD Jun 02 '11 at 09:48
  • You may find some useful clues on [CrossValidated](http://stats.stackexchange.com/), by looking at the [clustering](http://stats.stackexchange.com/questions/tagged/clustering) tag. – chl Jun 03 '11 at 09:39
  • Exact duplicates @ http://stackoverflow.com/q/1793532/353278 && http://stackoverflow.com/q/5933970/353278 – Jeff Jun 06 '11 at 04:20
  • I've answered a similar Q with half a dozen methods (using `R`) over here: stackoverflow.com/a/15376462/1036500 – Ben May 13 '13 at 04:52

2 Answers2

0

The base idea is to evaluate cluster scoring on sample data, usally it is distance inside cluster and distance between clusters. The more this measure the better clustering, based on this mesure you can select best clustring paramters. One of metrics can be found here http://alias-i.com/lingpipe/docs/api/com/aliasi/cluster/ClusterScore.html

yura
  • 14,489
  • 21
  • 77
  • 126
-8

Seriously, what do you want to know? Do you want us to tell you some number? Or a strategy how to find the optimal k? You have to read a book or other resources about k-means, I'm pretty sure it is covered there.

There is something on Wikipedia about it:

http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

Before you use an algorithm, read about it.

Felix Kling
  • 795,719
  • 175
  • 1,089
  • 1,143