3

I need a help to know how to find the optimal number of number of clusters using k-means cluster in R.

My code is

library(cluster)
library(factoextra)


#read data
data<-read.csv("..\file.txt",header=FALSE, sep=" ")

#determine number of clusters to use
k.max<- 22
wss <- sapply(2:k.max, function(k){kmeans(data, k, nstart=10 )$tot.withinss})

print(wss)

plot(2:k.max, wss, type="b", pch = 19,  xlab="Number of clusters K", ylab="Total within-clusters sum of squares")


fviz_nbclust(data, kmeans, method = "wss") + geom_vline(xintercept = 3, linetype = 2)

I get the plot, but I still do not know how to find the number?

Thanks

My plot is in this link to show the rlation between wss and number of clusters with no information about the optimal number of clusters

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
user4544869
  • 53
  • 1
  • 5

2 Answers2

4
n_clust<-fviz_nbclust(df, kmeans, method = "silhouette",k.max = 30)
n_clust<-n_clust$data
max_cluster<-as.numeric(n_clust$clusters[which.max(n_clust$y)])
YesThatIsMyName
  • 1,585
  • 3
  • 23
  • 30
Yan.Li
  • 41
  • 2
0

There is no sound mathematical definition of the "elbow" (because of having different scales on x and y, there is no angle), and in plots like yours there probably is no "elbow" at all.

Most likely, k-means did not work for any k. This happens quite often. For example if your data doesn't contain clusters.

Try generating uniform data, and do the same plot - it will look similar.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194