-1

The documentation of the initialization methods for clustering using ClusterR is very enigmatic. Are you able to post a reference to a paper describing optimal_init method?

This is what is stated officially:

optimal_init: this initializer adds rows of the data incrementally, while checking that they do not already exist in the centroid-matrix

  • 1
    Maybe it will help to [check source code](https://github.com/mlampros/ClusterR/blob/master/src/kmeans_miniBatchKmeans_GMM_Medoids.cpp#L130). – tekrei Sep 28 '17 at 11:59

1 Answers1

0

The initializers used in the KMeans_rcpp and MiniBatchKmeans of the ClusterR package are :

I added the last two (quantile_init and optimal_init) in the package, because I found out after testing in various data sets that they give similar (or better) results (using validation metrics) and/or run faster. They are both experimental and (you are right) in the next version of the package I'll add a note in the documentation details. You can see the rcpp code of quantile_init and optimal_init in the package repository.

lampros
  • 581
  • 5
  • 12