0

i have same problem like this topic [https://stackoverflow.com/questions/50107157/adding-labels-to-cluster, i followed the replied answer still not working on me then try to find another solution from this [https://stackoverflow.com/questions/8120984/scaling-data-in-r-ignoring-specific-columns also still not working

so far my code just follow from this topic [https://uc-r.github.io/kmeans_clustering and [https://afit-r.github.io/kmeans_clustering as shown below

1. library(tidyverse)
2. library(cluster)
3. library(factoextra)
4. dataMCU = read.csv("MCU180721.csv")
5. dataMCU <- na.omit(dataMCU)
6. dataMCU <- scale(dataMCU) 

this line number: 6 failed to proceed since show error like this Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric

for addition information my table in csv file describe below precint, green, yelloe, orange, red oregon, 6, 7, 8, 9

my question is how to resolve this problem. recently i've tried with this code dataMCU<-dataMCU[,c(-1)] before run scale() this solution works but not as expected. since i wish to have same result just like [https://uc-r.github.io/kmeans_clustering and [https://afit-r.github.io/kmeans_clustering

for additional information: Name of Town as Label this from [https://afit-r.github.io/kmeans_clustering

but my code always show like below picture Only show number not name of city

  • can you add dput(head(dataMCU)) you have after line 4? – Mohanasundaram Jul 20 '21 at 15:09
  • dataMCU <- dput(head(dataMCU)) result show below structure(list(Precint = c("Oregon", "Iowa", "Chicago", "Boston", "Idaho", "Texax"), Green = c(77L, 17L, 108L, 27L, 68L, 17L), Yellow = c(25L, 25L, 4L, 28L, 28L, 7L), Orange = c(1L, 7L, 0L, 5L, 11L, 2L), Red = c(0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 6L), class = "data.frame") – ghost square Jul 20 '21 at 15:16

1 Answers1

0

Your data frame has the character vector. You can have the states as row names and remove the first column.

rownames(dataMCU) <- dataMCU$Precint
dataMCU <- dataMCU[,-1]

Then, you can remove NA rows and scale the data frame.

dataMCU <- na.omit(dataMCU)
dataMCU <- scale(dataMCU)

Proceed with distance matrix and clustering:

distance <- get_dist(dataMCU)
fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07"))

clust <- kmeans(dataMCU, centers = 2, nstart = 25)

fviz_cluster(clust, data = dataMCU)

P.S.: To avoid the manual workaround, you can simply import csv file as

dataMCU = read.csv("fMCU180721.csv", row.names = 1, header= TRUE)

Edit:

Importing the csv with row names:

dataMCU = read.csv("DataPrecint.csv", row.names=1, header=TRUE)
dataMCU <- na.omit(dataMCU)
dataMCU <- scale(dataMCU)
distance <- get_dist(dataMCU)
fviz_dist(distance, gradient=list(low="green", mid="yellow", high="red"))
k4 <- kmeans(dataMCU, centers=4, nstart=25)
str(k4)
k4
fviz_cluster(k4, data = dataMCU)
Mohanasundaram
  • 2,889
  • 1
  • 8
  • 18
  • `code` dataMCU = read.csv("precints.csv") rownames(dataMCU) <- dataMCU$precints dataMCU <- dataMCU[-1] dataMCU <- na.omit(dataMCU) dataMCU <- scale(dataMCU) distance <- get_dist(dataMCU) fviz_dist(distance, gradient = list(low = "#00AFBB", mid = "white", high = "#FC4E07")) k4 <- kmeans(dataMCU, centers=4, nstart=25) str(k4) k4 fviz_cluster(k4, data=dataMCU) `code` thank you for your help. this code works but the result (plot) still not displaying the city name as label. the plot still displaying number as label. – ghost square Jul 20 '21 at 15:48
  • It's not precints, but Precint. Objects and variables are case sensitive. Try rownames(dataMCU) <- dataMCU$Precint – Mohanasundaram Jul 20 '21 at 18:01
  • $Precints = this is variabel from this read.csv("Precints.csv")? – ghost square Jul 21 '21 at 00:15
  • [link] https://pastebin.com/Acb0KciR i followed your instruction since the variable is case sensitive, the code on the link. but the result same not displaying the name of town – ghost square Jul 21 '21 at 03:08
  • Check my edit. If you are importing csv with row names, you don't have to rename the rows and remove the columns. – Mohanasundaram Jul 21 '21 at 09:20
  • can this function fviz_dist(distance, gradient=list(low="green", mid="yellow", high="red")) handle 4 categoris for instance green, yellow, orange and red? – ghost square Jul 24 '21 at 02:13
  • I don't think that this is possible. The gradient argument can be of three colours only. – Mohanasundaram Jul 26 '21 at 05:57
  • i see. we hope R will try to adjust this in future – ghost square Jul 27 '21 at 06:24