Arithmetic in R faster on numerics as opposed to integers. What's going on?

Question

I was in the middle of converting some code that utilized mostly numeric data (i.e. doubles) to integers and did a quick benchmark to see how much efficiency I gained.

To my surprise it was slower... by about 20%. I thought I had done something wrong, but the original code was only a few basic arithmetical operations on moderately sized vectors, so I knew it wasn't that. Maybe my environment was messed up? I restarted fresh, and the same result... integers were less efficient.

This started a series of test and a dive into the rabbit hole. Here is my first test. We sum one million elements using base R's sum. Note that with R version 3.5.0 the timings are quite a bit different and with v 3.5.1, the timings are about the same (still not what one would expect):

set.seed(123)
int1e6 <- sample(1:10, 1e6, TRUE)
dbl1e6 <- runif(1e6, 1, 10)

head(int1e6)
# [1] 5 3 6 8 6 2
class(int1e6)
# [1] "integer"

head(dbl1e6)
# [1] 5.060628 2.291397 2.992889 5.299649 5.217105 9.769613
class(dbl1e6)
#[1] "numeric"

mean(dbl1e6)
# [1] 5.502034
mean(int1e6)
# [1] 5.505185

## R 3.5.0
library(microbenchmark)
microbenchmark(intSum = sum(int1e6), dblSum = sum(dbl1e6), times = 1000)
Unit: microseconds
  expr      min       lq      mean   median       uq      max neval
intSum 1033.677 1043.991 1147.9711 1111.438 1200.725 2723.834  1000
dblSum  817.719  835.486  945.6553  890.529  998.946 2736.024  1000

## R 3.5.1
Unit: microseconds
  expr     min       lq      mean   median        uq      max neval
intSum 836.243 877.7655  966.4443 950.1525  997.9025 2077.257  1000
dblSum 866.939 904.7945 1015.3445 986.4770 1046.4120 2541.828  1000

class(sum(int1e6))
# [1] "integer"
class(sum(dbl1e6))
#[1] "numeric"

From here on out both version 3.5.0 and 3.5.1 give nearly identical results.

Here is our first dive into the rabbit hole. Along with the documentation for sum (see ?sum), we see that sum is simply a generic function that is dispatched via standardGeneric. Digging deeper, we see it eventually calls R_execMethod here on line 516. This is where I get lost. It looks to me, like R_execClosure is called next followed by many different possible branches. I think the standard path is to call eval next, but I'm not sure. My guess is that eventually, a function is called in arithimetic.c but I can't find anything that specifically sums a vector of numbers. Either way, based off of my limited knowledge of method dispatching and C in general, my naive assumption is that a function that looks like the following is called:

template <typename T>
T sum(vector<T> x) {

    T mySum = 0;

    for (std::size_t i = 0; i < x.size(); ++i)
        mySum += x[i];

    return mySum;
}

I know there is no function overloading or vectors in C, but you get my point. My belief is that eventually, a bunch of the same type of elements are added to an element of the same type and eventually returned. In Rcpp we would have something like:

template <typename typeReturn, typename typeRcpp>
typeReturn sumRcpp(typeRcpp x) {
    typeReturn mySum = 0;
    unsigned long int mySize = x.size();

    for (std::size_t i = 0; i < mySize; ++i)
        mySum += x[i];

    return mySum;
}

// [[Rcpp::export]]
SEXP mySumTest(SEXP Rx) {
    switch(TYPEOF(Rx)) {
        case INTSXP: {
            IntegerVector xInt = as<IntegerVector>(Rx);
            int resInt = sumRcpp<int>(xInt);
            return wrap(resInt);
        }
        case REALSXP: {
            NumericVector xNum = as<NumericVector>(Rx);
            double resDbl = sumRcpp<double>(xNum);
            return wrap(resDbl);
        }
        default: {
            Rcpp::stop("Only integers and numerics are supported");   
        }
    }
}

And the benchmarks confirm my normal thinking about the inherit efficiency dominance of integers:

microbenchmark(mySumTest(int1e6), mySumTest(dbl1e6))
Unit: microseconds
             expr      min       lq      mean    median        uq      max neval
mySumTest(int1e6)  103.455  160.776  185.2529  180.2505  200.3245  326.950   100
mySumTest(dbl1e6) 1160.501 1166.032 1278.1622 1233.1575 1347.1660 1644.494   100

Binary Operators

This got me thinking further. Maybe it is just the complexity wrapped around standardGeneric that makes the different data types behave strangely. So, let's skip all that jazz and go straight to the binary operators (+, -, *, /, %/%)

set.seed(321)
int1e6Two <- sample(1:10, 1e6, TRUE)
dbl1e6Two <- runif(1e6, 1, 10)

## addition
microbenchmark(intPlus = int1e6 + int1e6Two, 
               dblPlus = dbl1e6 + dbl1e6Two, times = 1000)
Unit: milliseconds
   expr      min       lq     mean   median       uq      max neval
intPlus 2.531220 3.214673 3.970903 3.401631 3.668878 82.11871  1000
dblPlus 1.299004 2.045720 3.074367 2.139489 2.275697 69.89538  1000

## subtraction
microbenchmark(intSub = int1e6 - int1e6Two,
               dblSub = dbl1e6 - dbl1e6Two, times = 1000)
Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
intSub 2.280881 2.985491 3.748759 3.166262 3.379755 79.03561  1000
dblSub 1.302704 2.107817 3.252457 2.208293 2.382188 70.24451  1000

## multiplication
microbenchmark(intMult = int1e6 * int1e6Two, 
               dblMult = dbl1e6 * dbl1e6Two, times = 1000)
Unit: milliseconds
   expr      min       lq     mean   median       uq      max neval
intMult 2.913680 3.573557 4.380174 3.772987 4.077219 74.95485  1000
dblMult 1.303688 2.020221 3.078500 2.119648 2.299145 10.86589  1000

## division
microbenchmark(intDiv = int1e6 %/% int1e6Two,
               dblDiv = dbl1e6 / dbl1e6Two, times = 1000)
Unit: milliseconds
  expr      min       lq     mean   median       uq      max neval
intDiv 2.892297 3.210666 3.720360 3.228242 3.373456 62.12020  1000
dblDiv 1.228171 1.809902 2.558428 1.842272 1.990067 64.82425  1000

The classes are preserved as well:

unique(c(class(int1e6 + int1e6Two), class(int1e6 - int1e6Two),
         class(int1e6 * int1e6Two), class(int1e6 %/% int1e6Two)))
# [1] "integer"

unique(c(class(dbl1e6 + dbl1e6Two), class(dbl1e6 - dbl1e6Two),
         class(dbl1e6 * dbl1e6Two), class(dbl1e6 / dbl1e6Two)))
# [1] "numeric"

With every case, we see that arithmetic is 40% - 70% faster on numeric data type. What is really strange is that we get an even larger discrepancy when the two vectors being operated on are identical:

microbenchmark(intPlus = int1e6 + int1e6, 
               dblPlus = dbl1e6 + dbl1e6, times = 1000)
Unit: microseconds
   expr      min       lq     mean   median       uq      max neval
intPlus 2522.774 3148.464 3894.723 3304.189 3531.310 73354.97  1000
dblPlus  977.892 1703.865 2710.602 1767.801 1886.648 77738.47  1000

microbenchmark(intSub = int1e6 - int1e6,
               dblSub = dbl1e6 - dbl1e6, times = 1000)
Unit: microseconds
  expr      min       lq     mean   median       uq      max neval
intSub 2236.225 2854.068 3467.062 2994.091 3214.953 11202.06  1000
dblSub  893.819 1658.032 2789.087 1730.981 1873.899 74034.62  1000

microbenchmark(intMult = int1e6 * int1e6, 
               dblMult = dbl1e6 * dbl1e6, times = 1000)
Unit: microseconds
   expr      min       lq     mean   median       uq      max neval
intMult 2852.285 3476.700 4222.726 3658.599 3926.264 78026.18  1000
dblMult  973.640 1679.887 2638.551 1754.488 1875.058 10866.52  1000

microbenchmark(intDiv = int1e6 %/% int1e6,
               dblDiv = dbl1e6 / dbl1e6, times = 1000)
Unit: microseconds
  expr      min       lq     mean   median       uq      max neval
intDiv 2879.608 3355.015 4052.564 3531.762 3797.715 11781.39  1000
dblDiv  945.519 1627.203 2706.435 1701.512 1829.869 72215.51  1000

unique(c(class(int1e6 + int1e6), class(int1e6 - int1e6),
         class(int1e6 * int1e6), class(int1e6 %/% int1e6)))
# [1] "integer"

unique(c(class(dbl1e6 + dbl1e6), class(dbl1e6 - dbl1e6),
         class(dbl1e6 * dbl1e6), class(dbl1e6 / dbl1e6)))
# [1] "numeric"

That is nearly a 100% increase with every operator type!!!

How about a regular for loop in base R?

funInt <- function(v) {
    mySumInt <- 0L
    for (element in v)
        mySumInt <- mySumInt + element
    mySumInt
}

funDbl <- function(v) {
    mySumDbl <- 0
    for (element in v)
        mySumDbl <- mySumDbl + element
    mySumDbl
}

microbenchmark(funInt(int1e6), funDbl(dbl1e6))
Unit: milliseconds
          expr      min       lq     mean   median       uq      max neval
funInt(int1e6) 25.44143 25.75075 26.81548 26.09486 27.60330 32.29436   100
funDbl(dbl1e6) 24.48309 24.82219 25.68922 25.13742 26.49816 29.36190   100

class(funInt(int1e6))
# [1] "integer"
class(funDbl(dbl1e6))
# [1] "numeric"

The difference isn't amazing, but still one would expect the integer sum to outperform the double sum. I really don't know what to think about this.

So my question is:

Why exactly do numeric data types outperform integer data types on basic arithmetical operations in base R?

Edit. Forgot to mention this:

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

I'm not sure what tags should apply to this. if anybody thinks the tags should be different feel free to update. — Joseph Wood, Aug 18 '18 at 19:11
Maybe because it needs to check for integer overflow? (random guess) — F. Privé, Aug 18 '18 at 19:16
@F.Privé, that's a good point... I'll do some digging to see if that is the case — Joseph Wood, Aug 18 '18 at 19:35
On CentOS 7, R 3.5.0, I get medians of 675 for int and 1022 for double for the first benchmark. — F. Privé, Aug 18 '18 at 20:18
@F.Privé, I just added my session info. Have you tested the binary operators? — Joseph Wood, Aug 18 '18 at 20:26

Ralf Stubner · Accepted Answer · 2018-08-20T08:32:12.343

F.Privé's "random guess" in the comments is really good! The function do_arith seems to be the starting point within arithmetic.c. First for scalars we see that the case of REALSXP is simple: e.g., standard + is used. For INTSXP there is a dispatch to, for example, R_integer_plus, which does indeed check for integer overflow:

static R_INLINE int R_integer_plus(int x, int y, Rboolean *pnaflag)
{
    if (x == NA_INTEGER || y == NA_INTEGER)
    return NA_INTEGER;

    if (((y > 0) && (x > (R_INT_MAX - y))) ||
    ((y < 0) && (x < (R_INT_MIN - y)))) {
    if (pnaflag != NULL)
        *pnaflag = TRUE;
    return NA_INTEGER;
    }
    return x + y;
}

Similar for other binary operations. For vectors it is also similar. Within integer_binary there is a dispatch to the same method, while in real_binary the standard operations are used without any checks.

We can see this in action using the following Rcpp code:

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
#include <cstdint>
using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector sumInt(IntegerVector a, IntegerVector b) {
  IntegerVector result(no_init(a.size()));
  std::transform(a.begin(), a.end(), b.begin(), result.begin(),
                 [] (int32_t x, int32_t y) {return x + y;});
  return result;
}

// [[Rcpp::export]]
IntegerVector sumIntOverflow(IntegerVector a, IntegerVector b) {
  IntegerVector result(no_init(a.size()));
  std::transform(a.begin(), a.end(), b.begin(), result.begin(),
                 [] (int32_t x, int32_t y) {
    if (x == NA_INTEGER || y == NA_INTEGER)
      return NA_INTEGER;
    if (((y > 0) && (x > (INT32_MAX - y))) ||
        ((y < 0) && (x < (INT32_MIN - y))))
      return NA_INTEGER;
    return x + y;
  });
  return result;
}

// [[Rcpp::export]]
NumericVector sumReal(NumericVector a, NumericVector b) {
  NumericVector result(no_init(a.size()));
  std::transform(a.begin(), a.end(), b.begin(), result.begin(),
                 [] (double x, double y) {return x + y;});
  return result;
}

/*** R
set.seed(123)
int1e6 <- sample(1:10, 1e6, TRUE)
int1e6two <- sample(1:10, 1e6, TRUE)
dbl1e6 <- runif(1e6, 1, 10)
dbl1e6two <- runif(1e6, 1, 10)

microbenchmark::microbenchmark(int1e6 + int1e6two,
                               sumInt(int1e6, int1e6two),
                               sumIntOverflow(int1e6, int1e6two),
                               dbl1e6 + dbl1e6two,
                               sumReal(dbl1e6, dbl1e6two),
                               times = 1000)
*/

Result:

Unit: microseconds
              expr      min        lq     mean    median       uq       max neval
int1e6 + int1e6two 1999.698 2046.2025 2232.785 2061.7625 2126.970  5461.816  1000
            sumInt  812.560  846.1215 1128.826  861.9305  892.089 44723.313  1000
    sumIntOverflow 1664.351 1690.2455 1901.472 1702.6100 1760.218  4868.182  1000
dbl1e6 + dbl1e6two 1444.172 1501.9100 1997.924 1526.0695 1641.103 47277.955  1000
           sumReal 1459.224 1505.2715 1887.869 1530.5995 1675.594  5124.468  1000

Introducing the overflow checking into the C++ code produces a significant reduction in performance. Even though it is not as bad as the standard +. So if you know that your integer numbers are "well behaved", you can gain quite a bit of performance by skipping R's error checking by going straight to C/C++. This reminds me of another question with a similar conclusion. The error checking done by R can be costly.

For the case with identical vectors, I get the following benchmark results:

Unit: microseconds
           expr      min       lq     mean    median       uq       max neval
int1e6 + int1e6 1761.285 2000.720 2191.541 2011.5710 2029.528 47397.029  1000
         sumInt  648.151  761.787 1002.662  767.9885  780.129 46673.632  1000
 sumIntOverflow 1408.109 1647.926 1835.325 1655.6705 1670.495 44958.840  1000
dbl1e6 + dbl1e6 1081.079 1119.923 1443.582 1137.8360 1173.807 44469.509  1000
        sumReal 1076.791 1118.538 1456.917 1137.2025 1250.850  5141.558  1000

There is a significant performance increase for doubles (both R and C++). For integers there is also some performance increase, but not as seizable as for doubles.

I wonder if one could define an "unsafe" `%+%` operator (don't know if the dispatching would eat up any gains) — Ben Bolker, Aug 18 '18 at 21:43
This is fantastic work per usual. One thing that still bothers me a bit is when the vectors are the same, we get the increased efficiency (more so with numerics) with the binary operators. If I had to guess, something is going on with the cache and possibly some branch prediction.... continuing — Joseph Wood, Aug 19 '18 at 09:41
Maybe, after 10000 or so identical element wise additions are made, the CPU can guess that this will carry out for the rest of the computation and not require a hard load of the second element. Continuing with my guess, we lose this advantage with the additional check for integers as it has to leave the parent function to check for the integer overflow. — Joseph Wood, Aug 19 '18 at 09:43
@JosephWood I have added my benchmark results when using the same vector. The perforamnce increase is more pronounced for doubles than for ints (both R and C++). I am not sure where this comes from. BTW, `R_integer_plus` et al. might be inlined by the compiler. — Ralf Stubner, Aug 20 '18 at 08:34

Arithmetic in R faster on numerics as opposed to integers. What's going on?

Binary Operators

1 Answers1