k-means is a clustering method, i.e. for unsupervised learning, not supervised, and as such isn't designed to predict on future data, as adding more data would change the centers. Supervised alternatives that can do classification include k-NN, LDA/QDA, and SVMs, but such an approach would require a training set with known classes.
All that said, you could write a predict
method for stats::kmeans
using dist
, as you're presumably really looking for the closest center to the point. Hardly optimized, but functional:
predict.kmeans <- function(object, newdata){
centers <- object$centers
n_centers <- nrow(centers)
dist_mat <- as.matrix(dist(rbind(centers, newdata)))
dist_mat <- dist_mat[-seq(n_centers), seq(n_centers)]
max.col(-dist_mat)
}
set.seed(47)
in_train <- sample(nrow(iris), 100)
mod_kmeans <- kmeans(iris[in_train, -5], 3)
test_preds <- predict(mod_kmeans, iris[-in_train, -5])
table(test_preds, iris$Species[-in_train])
#>
#> test_preds setosa versicolor virginica
#> 1 0 0 10
#> 2 0 18 7
#> 3 15 0 0