2

I wonder what kind of seed selection methods I can apply to K-means algorithm. Google search wasn't that helpful. Any suggestions?

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • Look at two-pass k-means: k-means a random sample, use those centres as seeds for the lot. See [should-we-use-k-means++](http://stackoverflow.com/questions/4706678/should-we-used-k-means-instead-of-k-means). – denis Nov 28 '11 at 12:22

2 Answers2

2

The seeds depend on the domain. For example, if your data items are words, your seeds should be the most frequent words. Otherwise, you could cluster a small sample and use that as a seed.

Here is an example of a more sophisticated algorithm:

Single Pass Seed Selection Algorithm for k-Means. K. Karteeka Pavan, Allam Appa Rao, A.V. Dattatreya Rao and G.R. Sridhar. Journal of Computer Science 6 (1): 60-66, 2010. pdf

cyborg
  • 9,989
  • 4
  • 38
  • 56
1

Google for "supervised" k means clustering & k++ means.... also specify your performance needs ( whats your k? how many input points?)

In general, a few thousand points can easily be clustered w a naive k means algorithm implementation... So I would try that first.

Also, if your not sure what K should be, try MCL clustering first to get a good estimate.

jayunit100
  • 17,388
  • 22
  • 92
  • 167