I am looking for the optimal number of clusters to be used for a cluster analysis. Therefore I am using a scree plot created by the following code:
WSS = sapply(1:20, FUN=function(k) {
kmeans(df, centers=k, nstart=5)$tot.withinss
})
plot(1:20, WSS, type="l")
abline(v=5, col="red", lty=2)
Now, I know that the bend/knee in the plot indicates the optimal number of clusters. Is there any way of retrieving this value without graphical inspection?
Data looks as follows
df = structure(list(Birthrate = c(18.2, 8.5, 54.1, 1.4, 2.1, 83.6,
17, 1, 0.8, 61.7, 4.9, 7.9, 2, 14.2, 48.2, 17.1, 10.4, 37.5,
1.6, 49.5, 10.8, 6.2, 7.1, 7.8, 3, 3.7, 4.2, 8.7), GDP = c(1.22,
0.06, 0, 0.54, 2.34, 0.74, 1.03, 1.21, 0, 0.2, 1.41, 0.79, 2.75,
0.03, 11.13, 0.05, 2.99, 0.71, 0, 0.9, 1.15, 0, 1.15, 1.44, 0,
0.71, 1.21, 1.45), Income = c(11.56, 146.75, 167.23, 7, 7, 7,
10.07, 7, 7, 7, 47.43, 20.42, 7.52, 7, 7, 15.98, 15.15, 20.42,
7, 22.6, 7, 7, 18.55, 7, 7.7, 7, 7, 7), Population = c(54, 94,
37, 95, 98, 31, 78, 97, 95, 74, 74, 81, 95, 16, 44, 63, 95, 20,
95, 83, 98, 98, 84, 62, 98, 98, 97, 98)), .Names = c("Birthrate",
"GDP", "Income", "Population"), class = "data.frame", row.names = c(NA,
-28L))