I have a data set with 30 variables. One of them is an indicator variable (0 or 1), and I would like to subtract the mean of those rows where the label is 1 for certain columns (Something like centering but taking the mean of certain rows instead of the entire column).
Col2 Col3 Col4 label
400 322 345 1
131 345 809 1
565 676 311 0
121 645 777 0
322 534 263 0
545 222 111 0
For the above dataset, I would like to perform the following operation for Col2:Col4
:
x(i,j)-x'(,j)
where x(i,j)
represents a cell, and x'(,j)
represents the mean of the rows in the column for which label=1
. For e.g, for [3,1]
it should be
(565-mean(400,131))= 299.5
Expected output for Column 2:
Col2
134.5
-134.5
299.5
-144.5
56.5
279.5
I have been trying to use the summarise_each
command but have been unsuccessful till now. The command I'm giving is
try<- group_by(data,lbl) %>% select(c(4,13:26)) %>% summarise_each(funs((.)-(mean(data[data$lbl==1,])))
But this is generating NA
and I'm not really sure where I'm going wrong (I'm sure it's in the summarise_each
command where I'm not able to figure out how to use funs()
correctly)
Any help is appreciated. Thanks!