What exactly is the initializationSteps parameter in Kmeans++ in Spark MLLib?

Question

I know what k-means is and I also understand what k-means++ algorithm is. I believe the only change is the way the initial K centers are found.

In the ++ version we initially choose a center and using a probability distribution we choose the remaining k-1 centers.

In the MLLib algorithm for k-means what is the initializationSteps parameter?

score 2 · Accepted Answer · answered Dec 18 '15 at 01:08

To be precise k-means++ is an algorithm for choosing initial centers and it doesn't describe a whole training process.

MLLib k-means is using k-means|| for initialization which is a distributed variant of ++. It samples not one, but multiple points for number of iterations.

initializationSteps corresponds to the number of iterations and according to the original paper should be roughly O(log n).

What exactly is the initializationSteps parameter in Kmeans++ in Spark MLLib?

1 Answers1

Linked