How to use ddply to get weighted-mean of class in dataframe?

Question

I'm new to plyr and want to take the weighted mean of values within a class to reshape a dataframe for multiple variables. Using the following code, I know how to do this for one variable, such as x2:

set.seed(123)
frame <- data.frame(class=sample(LETTERS[1:5], replace = TRUE),
                    x=rnorm(20), x2 = rnorm(20), weights=rnorm(20))
ddply(frame, .(class),function(x) data.frame(weighted.mean(x$x2, x$weights)))

However, I would like the code to create a new data frame for x and x2 (and any amount of variables in the frame). Does anybody know how to do this? Thanks

(You know you have to assign the output of `ddply` to something, right?) — smci, Mar 31 '14 at 22:45

score 7 · Accepted Answer · edited May 23 '17 at 10:28

You might find what you want in the ?summarise function. I can replicate your code with summarise as follows:

library(plyr)
set.seed(123)
frame <- data.frame(class=sample(LETTERS[1:5], replace = TRUE), x=rnorm(20), 
                    x2 = rnorm(20), weights=rnorm(20))
ddply(frame, .(class), summarise, 
      x2 = weighted.mean(x2, weights))

To do this for x as well, just add that line to be passed into the summarise function:

ddply(frame, .(class), summarise, 
      x = weighted.mean(x, weights),
      x2 = weighted.mean(x2, weights))

Edit: If you want to do an operation over many columns, use colwise or numcolwise instead of summarise, or do summarise on a melted data frame with the reshape2 package, then cast back to original form. Here's an example.

That would give:

wmean.vars <- c("x", "x2")

ddply(frame, .(class), function(x)
      colwise(weighted.mean, w = x$weights)(x[wmean.vars]))

Finally, if you don't like having to specify wmean.vars, you can also do:

ddply(frame, .(class), function(x)
      numcolwise(weighted.mean, w = x$weights)(x[!colnames(x) %in% "weights"]))

which will compute a weighted-average for every numerical field, excluding the weights themselves.

Thanks, this works. Is there a way to do this so you don't have to specify the function for each new variable? I'm working with a dataset with a 100 variables, so that would take a while! — coding_heart, Aug 23 '13 at 00:01
Thanks @flodel for filling in my very terse explanation. Following @thelatemail below, one could use `wmean.vars <- setdiff(names(frame), c("class","weights"))` to avoid specifying `x` and `x2`. — Frank, Aug 23 '13 at 00:28
well, thanks for mentioning `numcolwise`, I had never seen it before. — flodel, Aug 23 '13 at 00:38

thelatemail · Answer 2 · 2013-08-23T00:50:11.033

3

A data.table answer for fun, which also doesn't require specifying all the variables individually.

library(data.table)
frame <- as.data.table(frame)
keynames <- setdiff(names(frame),c("class","weights"))
frame[, lapply(.SD,weighted.mean,w=weights), by=class, .SDcols=keynames]

Result:

   class          x         x2
1:     B  0.1390808 -1.7605032
2:     D  1.3585759 -0.1493795
3:     C -0.6502627  0.2530720
4:     E  2.6657227 -3.7607866

edited Aug 23 '13 at 00:50

answered Aug 23 '13 at 00:10

thelatemail

91,185
12
128
188

+1 for `data.table`. Note that the `.SD` is unnecessary on `weights` (and there should be a workaround for `keynames` too, in theory): `frame[,lapply(.SD[,keynames,with=FALSE],weighted.mean,w=weights),by=class]` has the same result. – Frank Aug 23 '13 at 00:25

How to use ddply to get weighted-mean of class in dataframe?

2 Answers2