4

Below is my experiment:

> xx = 293.62882204364098
> yy = 0.086783439604999998
> print(xx + yy, 20)
[1] 293.71560548324595175
> print(sum(c(xx,yy)), 20)
[1] 293.71560548324600859

It is strange to me that sum() and + giving different results when both are applied to the same numbers.

Is this result expected?

How can I get the same result?

Which one is most efficient?

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Bogaso
  • 2,838
  • 3
  • 24
  • 54
  • Relevant SO question + answer re floating point number precision: https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal – Jon Spring Oct 12 '21 at 21:19
  • Interesting to note that `sum(xx,yy)` is the same as `xx+yy`. Also `Reduce(\`+\`, c(xx,yy))`. It's just `sum(c(xx,yy))` that's the odd man out. – MrFlick Oct 12 '21 at 21:21
  • But should not this precision impact both `sum` and `+` in same way? What is really different between those 2? – Bogaso Oct 12 '21 at 21:21

2 Answers2

5

There is an r-devel thread here that includes some detailed description of the implementation. In particular, from Tomas Kalibera:

R uses long double type for the accumulator (on platforms where it is available). This is also mentioned in ?sum: "Where possible extended-precision accumulators are used, typically well supported with C99 and newer, but possibly platform-dependent."

This would imply that sum() is more accurate, although this comes with a giant flashing warning sign that if this level of accuracy is important to you, you should be very worried about the implementation of your calculations [in terms both of algorithms and underlying numerical implementations].

I answered a question here where I eventually figured out (after some false starts) that the difference between + and sum() is due to the use of extended precision for sum().

This code shows that the sums of individual elements (as in sum(xx,yy) are added together with + (in C), whereas this code is used to sum the individual components; line 154 (LDOUBLE s=0.0) shows that the accumulator is stored in extended precision (if available).

I believe that @JonSpring's timing results are probably explained (but would be happy to be corrected) by (1) sum(xx,yy) will have more processing, type-checking etc. than +; (2) sum(c(xx,yy)) will be slightly slower than sum(xx,yy) because it works in extended precision.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    The source code is actually a function called [do_summary](https://github.com/wch/r-source/blob/f9c955fc6699a1f0482e4281ba658215c0e0b949/src/main/summary.c#L541). It handles `sum()`, `min()`, `max()` and `prod()` – MrFlick Oct 12 '21 at 21:53
  • I saw that but got confused ... thought it had something to do with `summary()`. – Ben Bolker Oct 12 '21 at 22:22
  • The translation from primitive name to c function name happens in [names.c](https://github.com/wch/r-source/blob/trunk/src/main/names.c). That would have made sense if `do_summary` was related to `summary()` but it turns out `summary.default` isn't a primitive function so all the work happens in R code, not C code. – MrFlick Oct 12 '21 at 22:50
  • Thanks. Then why `print(sum(c(xx,yy)), 20)` and `print(sum(xx,yy, 20)` are different? – Bogaso Oct 13 '21 at 05:00
  • `sum(xx,yy,20)` is equivalent to `sum(xx)+sum(yy)+20)`. – Ben Bolker Oct 14 '21 at 13:17
2

Looks like addition is 3x as fast as summing, but unless you're doing high-frequency trading I can't see a situation where this would be your timing bottleneck.

xx = 293.62882204364098
yy = 0.086783439604999998

microbenchmark::microbenchmark(xx + yy, sum(xx,yy), sum(c(xx, yy)))
Unit: nanoseconds
           expr min    lq   mean median    uq  max neval
        xx + yy  88 102.5 111.90  107.0 110.0  352   100
    sum(xx, yy) 201 211.0 256.57  218.5 232.5 2886   100
 sum(c(xx, yy)) 283 297.5 330.42  304.0 311.5 1944   100
Jon Spring
  • 55,165
  • 4
  • 35
  • 53