I have a dataset that has 6497 instance, 12 attributes, and a class variable called q (quality). The class values can range from 3 to 9. The data can be downloaded in CSV format from here
I am doing k-means cluster on this dataset and would like to plot it. But there seems to be something wrong with the plots I'm generating because I don't think they are representing the clusters. The plot I'm trying to generate is referred from this SO answer How to create a cluster plot in R?
Here is what I'm doing
library(vegan)
winequality <- read.csv("wine_nocolor.csv")
express <- winequality[, c("fa", "va", "ca", "rs", "ch", "fsd", "tsd", "d", "p", "s", "a")]
rownames(express) <- winequality$id
str(express) #'data.frame': 6497 obs. of 11 variables
kclus <- kmeans(express,centers= 3, iter.max=1000, nstart=10000) #takes a bit of time
wine_dist <- dist(express)
cmd <- cmdscale(wine_dist) #takes bit of time
groups <- levels(factor(kclus$cluster))
ordiplot(cmd, type = "n") #shows warning that Species scores not available
cols <- c("steelblue", "darkred", "darkgreen")
for(i in seq_along(groups)){
points(cmd[factor(kclus$cluster) == groups[i], ], col = cols[i], pch = 16)
}
# add spider and hull
ordispider(cmd, factor(kclus$cluster), label = TRUE)
ordihull(cmd, factor(kclus$cluster), lty = "dotted")
The above code produces the following plot. But as you can see, the clusters aren't demonstrated in a clear fashion.
Questions
- What are Dim1 and Dim2?
- How can I fix this?
- Additionally, does R offer a way to produce a plot similar to the plot generated by scikit for showing clusters and centroids?