I am trying to figure out mean of clusters, which I've assigned using cluster = sample(1:2,n,replace=T)
. For n=50
and for function x = rnorm(n), y=rnorm(n)
.
Then I created a data frame so that I could see x,y and its clusters that are randomly assigned.
data = data.frame(x,y,cluster)
Which then I got the result:
x y cluster
1 -0.89691455 0.41765075 2
2 0.18484918 0.98175278 1
3 1.58784533 -0.39269536 1
4 -1.13037567 -1.03966898 1
5 -0.08025176 1.78222896 2
6 0.13242028 -2.31106908 2
7 0.70795473 0.87860458 2
8 -0.23969802 0.03580672 1
9 1.98447394 1.01282869 2
10 -0.13878701 0.43226515 2
What I now wanted to do was to get the mean of the clusters. That is, what is the mean of cluster 1 and 2?
So what I did was:
m1 = sum(data[data$C==1])/sum(data$cluster==1)
Which doesn't give me the value I wanted. What I was expecting was mean of all values from x and y combined in cluster 1 and 2.