I have a problem understanding and replicating scale function in R. I know that it presents z-standardization (when all arguments are default), but I am having hard time obtaining exactly the same scaled values for particular cluster after clustering is performed. Here is an example:
Let's define a dataset:
set.seed(16)
nc=10
nr=10000
df1 = data.frame(matrix(sample(1:50, size=nr*nc,replace = T), ncol=nc, nrow=nr))
head(df1, n=4)
Before clustering I need to scale the data:
for_clst_km = scale(df1) #standardization with z-scores
Clusters <- kmeans(for_clst_km, 6, iter.max = 100000, nstart = 5)
After clustering is performed, I can obtain scaled values for cluster 3:
ver1=for_clst_km[Clusters$cluster==3,]
I now want to replicate ver1 using data from the original dataset df1:
cluster3 = df1[Clusters$cluster==3,]
cluster3$cluster = NULL
for_clst_means = apply(df1,2,mean)
for_clst_sd = apply(df1,2,sd)
ver2 = (sweep(cluster3, 2, for_clst_means))/for_clst_sd
ver3 = apply(cluster3, 2, function(x) ((x-for_clst_means)/for_clst_sd))
Finally when comparing those 3 versions I see they are different.
all(ver1 == ver2)
[1] FALSE
all(ver1 == ver3)
[1] FALSE
Why is that? And how can I obtain ver2 or ver3 to be exactly the same as ver1. Thanks!