Optimal number of clusters TramineR

Question

My problem may seem trivial to most of you. I'm working on hierarchical clustering using warde method with my data and would I like to identify the optimal number of clusters. This is the plot that shows hierarchical clustering from an optimal matching distance. But what is the optimal number of clusters in this case? How can I determine this?

Sample code:

 costs <- seqcost(df_new.seq, method="TRATE")
df_new.seq.om<- seqdist(df_new.seq, method="OM", sm=costs$sm, indel=costs$indel)

  
#########################   cluster ward      ###########################

clusterward <- agnes(df_new.seq.om, diss = TRUE, method = "ward")

dev.new()
plot(clusterward, which.plots = 2)


cl1.4 <- cutree(clusterward, k = 10)
cl1.4fac <- factor(cl1.4, labels = paste("Cluster", 1:10))

Look at the WeightedCluster package that provides a series of cluster quality measures. The package comes with a very useful vignette. — Gilbert, Aug 25 '21 at 09:30

score 0 · Answer 1 · answered Dec 20 '22 at 15:10

While this question is over a year old at this point and the poster have hopefully decided on their clusters, for anyone finding this post and wondering the same thing: How do I best decide on the optimal number of clusters when doing sequence analysis, I highly recommend this paper on cluster validation. I've found it very useful! It comes with a step by step example.

Studer, M. (2021). Validating Sequence Analysis Typologies Using Parametric Bootstrap. Sociological Methodology, 51(2), 290–318. https://doi-org.proxy.ub.umu.se/10.1177/00811750211014232

Optimal number of clusters TramineR

1 Answers1