I have data "colleges". It has many NAs.
library(mlbench)
library(stats)
College <- read.csv("colleges.XL.csv", header=T)
na.college<- na.omit(College)
row.names(na.college) <- NULL
na.college[, c(4:23)] <- scale(as.matrix(na.college[,c(-1,-2,-3)]))
plot(hc<-hclust(dist(na.college[,c(-1,-2,-3)]),method="complete"),hang=-1)
a=11
groups <- cutree(hc, a) # cut tree into "a" clusters
# draw dendogram with red borders around the "a" clusters
rect.hclust(hc, a, border="red")
# your matrix dimensions have to match with the clustering results
# remove any columns from na.college, as you did for clustering
mat <- na.college
# select the columns based on the clustering results
cluster_1 <- mat[which(groups==1),]
cluster_2 <- mat[which(groups==2),]
cluster_3 <- mat[which(groups==3),]
cluster_4 <- mat[which(groups==4),]
cluster_5 <- mat[which(groups==5),]
cluster_6 <- mat[which(groups==6),]
cluster_7 <- mat[which(groups==7),]
cluster_8 <- mat[which(groups==8),]
cluster_9 <- mat[which(groups==10),]
cluster_11 <- mat[which(groups==11),]
cluster_1<-rbind(cluster_1[, -(1:3)], colMeans(cluster_1[, -(1:3)]))
From the standardized data, I made 11 cluster and 11 clusters' data sets. Now the original data, College, has one observation. It has many NAs but not all of it are NAs. However, Its column values are not standardized.
I want it to have standardized values except NAs so as to figure out which it should belong to among 11 clusters.
If you have any answers, please let me know.