I would like to demean multiple columns in an R data.frame
. Using an example from this question
set.seed(999)
library(plyr)
library(plm)
# random data.frame
dat <- expand.grid(id=factor(1:3), cluster=factor(1:6))
dat <- cbind(dat, x=runif(18), y=runif(18, 2, 5))
#demean x and y
dat.2 <- ddply(dat, .(cluster), transform, x=x-mean(x), y=y-mean(y))
My problem is that I have (lots) more than 2 variables, and I would like to avoid hard-coding this analysis. I'm new to plyr
in general; why does this
dat.2 <- ddply(dat[,c(x,y)], .(cluster), transform, function(x) x - mean(x))
not work? Is there some crucial step that I'm missing? Is there a better way to do this in general?