How does sklearn.cluster.KMeans handle an init ndarray parameter with missing centroids (available centroids less than n_clusters)?

Question

In Python sklearn KMeans (see documentation), I was wondering what happens internally when passing an ndarray of shape (n, n_features) to the init parameter, When n<n_clusters

Does it drop the given centroids and just starts a kmeans++ initialization which is the default choice for the init parameter ? (PDF paper kmeans++) (How does Kmeans++ work)
Does it consider the given centroids and fill accordingly the remaining centroids using kmeans++ ?
Does it consider the given centroids and fill the remaining centroids using random values ?

I didn't expect that this method returns no warning in this case. That's why I need to know how it manages this.

score 2 · Accepted Answer · answered May 11 '15 at 22:09

2

If you give it a mismatching init it will adjust the number of clusters, as you can see from the source. This is not documented and I would consider it a bug. I'll propose to fix it.

answered May 11 '15 at 22:09

Andreas Mueller

27,470
8
62
74

1

It would be interesting if it fills the remainig according to the Kmeans++ initialization method, considering the given centroids – belas May 12 '15 at 09:11
1

We could add an option to do that, but it seems very specific. In general, this is probably a sign of an error in user code. We could add an "fill_clusters='kmeans++'" option that by default raises an error. But I'm not sure it is worth adding this code. You can easily implement it yourself, though. – Andreas Mueller May 12 '15 at 20:14
How might you implement this? [Link to relevant question and background](https://stackoverflow.com/questions/64921503/define-k-1-cluster-centroids-sklearn-kmeans) – Sean Carter Nov 20 '20 at 16:44

How does sklearn.cluster.KMeans handle an init ndarray parameter with missing centroids (available centroids less than n_clusters)?

1 Answers1

Linked