0

I have the following script that I can use to find the best number of the cluster using kmeans. How to change the following script using the EM clustering technique rather than kmeans.

reproducible example:

ourdata<- scale(USArrests)

Appreciate!

wss <- (nrow(ourdata)-1)*sum(apply(ourdata,2,var))
for (i in 2:10) wss[i] <- sum(kmeans(ourdata, 
                                      centers=i)$withinss)

plot(1:10, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")
Ester Silva
  • 670
  • 6
  • 24
  • What's the EM clustering technique? Do you have a reference for that method? EM is a general way to maximize a likelihood, what likelihood are you trying to model? Also, when asking for help you should include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data we can use for testing. – MrFlick Jul 04 '18 at 16:34
  • 1
    @MrFlick, EM : Expectation Maximization. "Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1–38, 1977." – Ester Silva Jul 04 '18 at 16:41

1 Answers1

1

The EMCluster package offers a variety of functions for running EM model-based clustering. An example of finding a solution with k = 3 clusters:

Update per OP's comment:

You can calculate the within sums of squares, along with other metrics of interest, using fpc::cluster.stats(). These can be extracted and plotted akin to your original post. As a reminder, "the elbow technique" as you described is an inaccurate description because the elbow technique is a general techinque and can and is used with any metric of choice. It is not only used for within sums of squares as in your original post.

library(EMCluster)
library(fpc)

ourdata<- scale(USArrests)
dist_fit <- dist(ourdata)

num_clusters <- 2:4

set.seed(1)
wss <- vapply(num_clusters, function(i_k) {
  em_fit <- em.EM(ourdata, nclass = i_k, lab = NULL, EMC = .EMC,
                  stable.solution = TRUE, min.n = NULL, min.n.iter = 10)
  cluster_stats_fit <- fpc::cluster.stats(dist_fit, em_fit$class)
  cluster_stats_fit$within.cluster.ss
}, numeric(1))

plot(num_clusters, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares")
MHammer
  • 1,274
  • 7
  • 12
  • Thanks for the answer, btu my main question is how to use EM clustering with Elbow technique? – Ester Silva Jul 05 '18 at 10:32
  • 1
    @EsterSilva - While that requirement could be inferred from your code, it was not explicitly stated in your question. Additionally "the elbow technique" is an inaccurate description because the elbow technique can and is used with any metric of choice. It is not only used on wss as in your example. I've updated my post accordingly with an example of how to fulfill your request. – MHammer Jul 05 '18 at 14:27