Why is mean() so slow?

Question

Everything is in the question! I just tried to do a bit of optimization, and nailing down the bottle necks, out of curiosity, I tried that:

t1 <- rnorm(10)
microbenchmark(
  mean(t1),
  sum(t1)/length(t1),
  times = 10000)

and the result is that mean() is 6+ times slower than the computation "by hand"!

Does it stem from the overhead in the code of mean() before the call to the Internal(mean) or is it the C code itself which is slower? Why? Is there a good reason and thus a good use case?

I saw this discussed briefly here (Section 3): http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table#method_dispatch_takes_time — Frank, Sep 04 '13 at 02:24
Also, under "Standard and internal functions" on this blog post: http://www.noamross.net/blog/2013/4/25/faster-talk.html — Frank, Sep 04 '13 at 02:25
It also worth looking at @hadley's answer http://stackoverflow.com/questions/16386154/is-s4-method-dispatch-slow — mnel, Sep 04 '13 at 05:40
Because it is not the same algorithm than sum(t)/length(t). Cf http://stackoverflow.com/questions/17866149/what-algorithm-is-r-using-to-calculate-mean — Karl Forner, Sep 04 '13 at 11:27

score 36 · Accepted Answer · edited Aug 22 '18 at 18:08

36

It is due to the s3 look up for the method, and then the necessary parsing of arguments in mean.default. (and also the other code in mean)

sum and length are both Primitive functions. so will be fast (but how are you handling NA values?)

t1 <- rnorm(10)
microbenchmark(
  mean(t1),
  sum(t1)/length(t1),
  mean.default(t1),
  .Internal(mean(t1)),
  times = 10000)

Unit: nanoseconds
                expr   min    lq median    uq     max neval
            mean(t1) 10266 10951  11293 11635 1470714 10000
  sum(t1)/length(t1)   684  1027   1369  1711  104367 10000
    mean.default(t1)  2053  2396   2738  2739 1167195 10000
 .Internal(mean(t1))   342   343    685   685   86574 10000

The internal bit of mean is faster even than sum/length.

See http://rwiki.sciviews.org/doku.php?id=packages:cran:data.table#method_dispatch_takes_time (mirror) for more details (and a data.table solution that avoids .Internal).

Note that if we increase the length of the vector, then the primitive approach is fastest

t1 <- rnorm(1e7)
microbenchmark(
     mean(t1),
     sum(t1)/length(t1),
     mean.default(t1),
     .Internal(mean(t1)),
+     times = 100)

Unit: milliseconds
                expr      min       lq   median       uq      max neval
            mean(t1) 25.79873 26.39242 26.56608 26.85523 33.36137   100
  sum(t1)/length(t1) 15.02399 15.22948 15.31383 15.43239 19.20824   100
    mean.default(t1) 25.69402 26.21466 26.44683 26.84257 33.62896   100
 .Internal(mean(t1)) 25.70497 26.16247 26.39396 26.63982 35.21054   100

Now method dispatch is only a fraction of the overall "time" required.

edited Aug 22 '18 at 18:08

Caramiriel

7,029
3
30
50

answered Sep 04 '13 at 02:28

mnel

113,303
27
265
254

How do the benchmarks scale according to different length input vectors? – A5C1D2H2I1M1N2O1R2T1 Sep 04 '13 at 02:35
Any thought on why (with the long vectors) `sum(t1)/length(t2)` is faster than all the others? – Josh O'Brien Sep 04 '13 at 10:45
4

@JoshO'Brien `base::mean` does two passes over the vector (at C level) to correct for numerical accuracy (see src/summary.c). Btw, data.table optimized mean also does that to retain that nice feature and be exactly the same as base mean. `sum` is a single pass, and `length` is in the vector's header (no loop). – Matt Dowle Sep 04 '13 at 15:43
@MatthewDowle -- Thanks! I haven't written any C, but it looks like that second loop's role is +/- to ensure that you get the same accuracy for a vector of numbers in the range 10000000.1 - 10000000.2 as you do for a vector in the range 0.1 - 0.2 (up to an eventual truncation of the result in the first case). Is that about right? – Josh O'Brien Sep 04 '13 at 15:54
@JoshO'Brien: see my answer and the linked question. – Joshua Ulrich Sep 04 '13 at 16:07
@JoshuaUlrich -- Thanks. I had missed those. Zach's (wife's) answer to the linked question answers my question. – Josh O'Brien Sep 04 '13 at 16:16
Thanks for all these nice answers and comment! Nice breakdown of dispatch >> .default overhead >> internal and the variation with vector length. – Antoine Lizée Sep 04 '13 at 17:42

score 24 · Answer 2 · edited May 23 '17 at 12:10

24

mean is slower than computing "by hand" for several reasons:

S3 Method dispatch
NA handling
Error correction

Points 1 and 2 have already been covered. Point 3 is discussed in What algorithm is R using to calculate mean?. Basically, mean makes 2 passes over the vector in order to correct for floating point errors. sum only makes 1 pass over the vector.

Notice that identical(sum(t1)/length(t1), mean(t1)) may be FALSE, due to these precision issues.

> set.seed(21); t1 <- rnorm(1e7,,21)
> identical(sum(t1)/length(t1), mean(t1))
[1] FALSE
> sum(t1)/length(t1) - mean(t1)
[1] 2.539201e-16

edited May 23 '17 at 12:10

Community

1
1

answered Sep 04 '13 at 11:35

Joshua Ulrich

173,410
32
338
418

Does this apply to `rowMeans` and `colMeans` as well? – Ferdinand.kraft Sep 05 '13 at 18:48
@Ferdinand.kraft: I'm not sure what "this" refers to, so I can't answer "yes" or "no"... :) But `rowMeans` and `colMeans` do not do any error correction. – Joshua Ulrich Sep 05 '13 at 18:56

Why is mean() so slow?

2 Answers2

Linked