I am interested in finding a function
to automatically determine the optimal number of clusters in R.
I am using a sequence algorithm from the package TraMineR
to compute my distances.
library(TraMineR)
data(biofam)
biofam.seq <- seqdef(biofam[501:600, 10:25])
## OM distances ##
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE",
full.matrix = F)
For instance, hclust
can simply be used like this
h = hclust(as.dist(biofam.om), method = 'ward')
and the number of clusters can then be manually determined with
clusters = cutree(h, k = 7)
What I would like ultimately is to automatically set up in the cutree
function the k
number of clusters, based on an "ideal" number of clusters.
It seems that the package clValid
has such function (optimalScores
).
However, I cannot pass a distance matrix into clValid
.
clValid(obj = as.dist(biofam.om), 2:6, clMethods = 'hierarchical')
I get this error
argument 'obj' must be a matrix, data.frame, or ExpressionSet object
I get the same kind of error using other packages such as NbClust
NbClust(diss = as.dist(biofam.om), method = 'ward.D')
Data matrix is needed.
Anyone knows how to solve this or knows other packages?
Thanks.