0

I know what k-means is and I also understand what k-means++ algorithm is. I believe the only change is the way the initial K centers are found.

In the ++ version we initially choose a center and using a probability distribution we choose the remaining k-1 centers.

In the MLLib algorithm for k-means what is the initializationSteps parameter?

zero323
  • 322,348
  • 103
  • 959
  • 935
London guy
  • 27,522
  • 44
  • 121
  • 179

1 Answers1

2

To be precise k-means++ is an algorithm for choosing initial centers and it doesn't describe a whole training process.

MLLib k-means is using k-means|| for initialization which is a distributed variant of ++. It samples not one, but multiple points for number of iterations.

initializationSteps corresponds to the number of iterations and according to the original paper should be roughly O(log n).

zero323
  • 322,348
  • 103
  • 959
  • 935