Demean R data frame

Question

I would like to demean multiple columns in an R data.frame. Using an example from this question

set.seed(999)
library(plyr)
library(plm)
# random data.frame
dat <- expand.grid(id=factor(1:3), cluster=factor(1:6))
dat <- cbind(dat, x=runif(18), y=runif(18, 2, 5))

#demean x and y
dat.2 <- ddply(dat, .(cluster), transform, x=x-mean(x), y=y-mean(y))

My problem is that I have (lots) more than 2 variables, and I would like to avoid hard-coding this analysis. I'm new to plyr in general; why does this

dat.2 <- ddply(dat[,c(x,y)],  .(cluster), transform, function(x) x - mean(x))

not work? Is there some crucial step that I'm missing? Is there a better way to do this in general?

you could melt your data.frame into long format, `m = melt(dat, id=c("cluster", "id")); ddply(m, c("cluster", "variable"), mutate, value = value - mean(value))`. — baptiste, May 27 '14 at 23:48

flodel · Accepted Answer · 2014-05-28T00:01:11.567

6

Have a look at the colwise functor. The only thing to be careful about is that id column. Hence:

demean <- colwise(function(x) if(is.numeric(x)) x - mean(x) else x)
dat.2 <- ddply(dat, .(cluster), demean)

Edit: as you found, there is even a numcolwise functor for only dealing with numerics so you can do:

demean <- numcolwise(function(x) x - mean(x))
dat.2 <- ddply(dat, .(cluster), demean)

You can also use the scale function rather than define your own function:

dat.2 <- ddply(dat, .(cluster), numcolwise(scale, scale = FALSE))

edited May 28 '14 at 00:01

answered May 27 '14 at 23:53

flodel

87,577
21
185
223

1

Cool... would `numcolwise` (just reading the `?` dialog) obviate the need for the `if(...)` statement? – gregmacfarlane May 27 '14 at 23:55

Demean R data frame

1 Answers1

Linked