0

I want to compute a weighted mean over factors with the code below

factor <- factor(cut(var1, quantile(var1, seq(0,1,0.1))))
var2_split = split(vat2, factor)
weight_split = split(weight, factor)
sapply(var2_split, weighted.mean, weight_split)

I get the following error

Error in FUN(X[[1L]], ...) : 'x' and 'w' must have the same length

How do I format my vector and weights for sapply?

As an example

suppose I have a matrix m with 3 columns x,y,z where x is a set of target values, y is a set of weights, and z is a set of values over which I want to bucket weighted.mean(x,y). Specifically I want weighted.mean(x,y) bucketed by quartiles of z.

# Code that doesn't work 

x <- c(1,2,3,4,5,6)
y <- c(6,3,4,2,3,4)
z <- c(1,1,2,3,3,4)
m <- as.matrix(c(x,y,z),nrow=6,ncol=3)) 
# bucket z by quartile.
z.factor <- cut(z, quantile(z, seq(0,1,0.25)), include.lowest=TRUE)
x.split = split(x, z.factor)
y.split = split(y, z.factor)
# want to bucket weighted.mean(x,y) on quartiles of z
sapply(x.split, weighted.mean, y.split)
user196711
  • 311
  • 5
  • 17
  • 2
    You can only sapply over one vector/list at a time. If you want to simultaneously iterate allong var2_split and weight_split, try `mapply` or `Map` instead. It would be easier to give a more specific answer if you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with your question. – MrFlick Sep 02 '14 at 19:21
  • Does mapply work for the example above ? – user196711 Sep 03 '14 at 18:50
  • 1
    Your example above isn't [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Provide some sample input data (ie `var1`, `vat2`, `weight`, etc) so it's possible to run and test on data that looks like your actual input. – MrFlick Sep 03 '14 at 18:52
  • Sorry, above is an attempt at what I want to do. I have two problems : (1) grouping by quartiles and (2) applying weighted.mean on each group. – user196711 Sep 03 '14 at 20:01

1 Answers1

0

With your specific sample, try

#first, note the include.lowest=TRUE to get all values
z.factor <- factor(cut(z, quantile(z, seq(0,1,0.25)), include.lowest=TRUE))

#same
x.split = split(x, z.factor)
y.split = split(y, z.factor)

# here we use mapply
mapply(weighted.mean, x.split, y.split)

this gives

[1,1.25] (1.25,2.5]    (2.5,3]      (3,4] 
1.333333   3.000000   4.600000   6.000000 

which seems correct given your sample input.

MrFlick
  • 195,160
  • 17
  • 277
  • 295