1

I am interested in finding a function to automatically determine the optimal number of clusters in R.

I am using a sequence algorithm from the package TraMineR to compute my distances.

library(TraMineR) 

data(biofam)
biofam.seq <- seqdef(biofam[501:600, 10:25])

## OM distances ##
biofam.om <- seqdist(biofam.seq, method = "OM", indel = 3, sm = "TRATE", 
full.matrix = F)

For instance, hclust can simply be used like this

h = hclust(as.dist(biofam.om), method = 'ward')

and the number of clusters can then be manually determined with

clusters = cutree(h, k = 7)

What I would like ultimately is to automatically set up in the cutree function the k number of clusters, based on an "ideal" number of clusters.

It seems that the package clValid has such function (optimalScores). However, I cannot pass a distance matrix into clValid.

clValid(obj = as.dist(biofam.om), 2:6, clMethods = 'hierarchical')

I get this error

argument 'obj' must be a matrix, data.frame, or ExpressionSet object

I get the same kind of error using other packages such as NbClust

NbClust(diss = as.dist(biofam.om), method = 'ward.D')  

Data matrix is needed.

Anyone knows how to solve this or knows other packages?

Thanks.

giac
  • 4,261
  • 5
  • 30
  • 59
  • 2
    try `NbClust(diss = as.matrix(as.dist(biofam.om)), method = 'ward.D')`. In documentation it says a matrix is required – Sergej Andrejev Jan 27 '18 at 16:50
  • 1
    You can use the `kgs` penalty function to get the optimal number of clusters. You will need your `hclust` and distance matrix objects. Also see this [post](https://stackoverflow.com/questions/47776054/r-cluster-analysis-and-dendrogram-with-correlation-matrix/47777081#47777081). – patL Jan 29 '18 at 16:22

1 Answers1

1

There are several different criteria for measuring the quality of a clustering result and choosing the optimal number of clusters. Take a look at the weightedCluster package: http://mephisto.unige.ch/weightedcluster/WeightedCluster.pdf You can easily compare between different measures and numbers of clusters.

Satu
  • 171
  • 1
  • 8