4

My problem may seem trivial to most of you. I'm working on hierarchical clustering using warde method with my data and would I like to identify the optimal number of clusters. This is the plot that shows hierarchical clustering from an optimal matching distance. But what is the optimal number of clusters in this case? How can I determine this?

Sample code:

 costs <- seqcost(df_new.seq, method="TRATE")
df_new.seq.om<- seqdist(df_new.seq, method="OM", sm=costs$sm, indel=costs$indel)

  
#########################   cluster ward      ###########################

clusterward <- agnes(df_new.seq.om, diss = TRUE, method = "ward")

dev.new()
plot(clusterward, which.plots = 2)


cl1.4 <- cutree(clusterward, k = 10)
cl1.4fac <- factor(cl1.4, labels = paste("Cluster", 1:10))

enter image description here

Rstudent
  • 887
  • 4
  • 12
  • 2
    Look at the WeightedCluster package that provides a series of cluster quality measures. The package comes with a very useful vignette. – Gilbert Aug 25 '21 at 09:30

1 Answers1

0

While this question is over a year old at this point and the poster have hopefully decided on their clusters, for anyone finding this post and wondering the same thing: How do I best decide on the optimal number of clusters when doing sequence analysis, I highly recommend this paper on cluster validation. I've found it very useful! It comes with a step by step example.

Studer, M. (2021). Validating Sequence Analysis Typologies Using Parametric Bootstrap. Sociological Methodology, 51(2), 290–318. https://doi-org.proxy.ub.umu.se/10.1177/00811750211014232