If you really want to get performance improvements, code has to be written to leverage underlying hardware concurrency. You can do this using the RcppParallel
package and its parallelFor
would be an ideal vessel for this.
You can also try a more modern implementation of R/C++
. The next version of Rcpp11
, released in a few days will feature automatically threaded sugar, making the expSugar
from the previous answer better.
Consider:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
NumericVector exp2(NumericVector x) {
NumericVector z = Rcpp::clone(x);
int n = z.size();
for (int i=0; i<n; ++i)
z[i] = exp(z[i]);
return z;
}
// [[Rcpp::export]]
NumericVector expSugar(NumericVector x) {
return exp(x) ;
}
/*** R
library(microbenchmark)
x <- rcauchy(1000000)
microbenchmark(exp(x), exp2(x), expSugar(x))
*/
With Rcpp
I get:
$ RcppScript /tmp/exp.cpp
> library(microbenchmark)
> x <- rcauchy(1e+06)
> microbenchmark(exp(x), exp2(x), expSugar(x))
Unit: milliseconds
expr min lq median uq max neval
exp(x) 7.027006 7.222141 7.421041 8.631589 21.78305 100
exp2(x) 6.631870 6.790418 7.064199 8.145561 31.68552 100
expSugar(x) 6.491868 6.761909 6.888111 8.154433 27.36302 100
So nice, but somewhat anecdotic improvement which can be explained by various inlining, etc ... as described in other answers and comments.
With Rcpp11
and automatic threaded sugar, I get:
$ Rcpp11Script /tmp/exp.cpp
> library(microbenchmark)
> x <- rcauchy(1e+06)
> microbenchmark(exp(x), exp2(x), expSugar(x))
Unit: milliseconds
expr min lq median uq max neval
exp(x) 7.029882 7.077804 7.336214 7.656472 15.38953 100
exp2(x) 6.636234 6.748058 6.917803 7.017314 12.09187 100
expSugar(x) 1.652322 1.780998 1.962946 2.261093 12.91682 100