23

How do I compute the weighted mean in R?

For example, I have 4 elements of which 1 element is of size (or: length, width, etc.) 10 and 3 elements are of size 2.

> z = data.frame(count=c(1,3), size=c(10,2))
> z
  count size
1     1   10
2     3    2

The weighted average is (10 * 1 + 2 * 3) / 4 = 4.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Frank
  • 64,140
  • 93
  • 237
  • 324
  • 6
    Speaking for myself, I downvoted because a google search for "weighted average in R" returns the help page for weighted.mean as the very first result. – joran Jun 12 '12 at 04:35
  • 3
    @Frank Hover over the down triangle beneath the vote count next to your Q. The tool tip says: "This question does not show any research effort; ...". Given that someone here has already asked a very similar Q here that could easily be found via a search, and a Google search takes you to the correct Answer, that may be why you got Downvotes and had your Q closed. – Gavin Simpson Jun 12 '12 at 07:38
  • 1
    The other question appears to be different, the OP is asking about weighted variance as he clarified in his comment on the accepted answer: _> yes, i'm looking for weighted variance though. not mean – Alex Apr 8 '12 at 2:26_ – Chris Snow Jun 11 '15 at 06:08
  • 3
    Voting to reopen; as @ChrisSnow notes, the [other question](http://stackoverflow.com/questions/10049402/calculating-weighted-mean-and-standard-deviation) seems different, and in any case is *much* less clear than this one. – Ilmari Karonen Apr 05 '16 at 09:21

3 Answers3

40

Use weighted.mean:

> weighted.mean(z$size, z$count)
[1] 4
Frank
  • 64,140
  • 93
  • 237
  • 324
23

Seems like you already know how to calculate this, just need a nudge in the right direction to implement it. Since R is vectorized, this is pretty simple:

with(z, sum(count*size)/sum(count))

The with bit just saves on typing and is equivalent to sum(z$count*z$size)/sum(z$count)

Or use the built in function weighted.mean() as you also pointed out. Using your own function can prove faster, though will not do the same amount of error checking that the builtin function does.

builtin <- function() with(z, weighted.mean(count, size))
rollyourown <- function() with(z, sum(count*size)/sum(count))

require(rbenchmark)  
  benchmark(builtin(), rollyourown(),
            replications = 1000000,
            columns = c("test", "elapsed", "relative"),
            order = "relative")
#-----
           test elapsed relative
2 rollyourown()   13.26 1.000000
1     builtin()   22.84 1.722474
Chase
  • 67,710
  • 18
  • 144
  • 161
1

Another option is collapse::fmean which includes a w parameter for weights, and is noticeably fast:

library(collapse)
fmean(z$size, w = z$count)
#[1] 4

Benchmark with 10,000 rows:

# Unit: microseconds
#           expr     min      lq      mean   median       uq     max neval
#      builtin() 165.801 239.401 257.67796 246.9515 263.2015 508.201   100
#  rollyourown()  45.501  73.701  81.57205  75.7510  79.7010 196.000   100
#     collapse()  26.301  27.901  32.51103  28.7510  30.7510 122.801   100

Code for benchmark:

library(collapse)
z = data.frame(count = rnorm(10000), size = runif(10000))
collapse <- function() fmean(z$size, w = z$count)
builtin <- function() with(z, weighted.mean(count, size))
rollyourown <- function() with(z, sum(count*size)/sum(count))
microbenchmark(builtin(), rollyourown(), collapse())
Maël
  • 45,206
  • 3
  • 29
  • 67