Rcpp programming efficiency

Question

I am a beginner with Rcpp. Currently I wrote a Rcpp code, which was applied on two 3 dimensional arrays: Array1 and Array2. Suppose Array1 has dimension (1000, 100, 40) and Array2 has dimension (1000, 96, 40).

I would like to perform wilcox.test using:

wilcox.test(Array1[i, j,], Array2[i,,])

In R, I wrote nested for loops that completed the calculation in about a half hour.

Then, I wrote it into Rcpp. The calculation within Rcpp took an hour to achieve the same results. I thought it should be faster since it is written in C++ language. I guess that my style of coding is the cause of the low efficient.

The following is my Rcpp code, would you mind helping me find out what improvement should I make please? I appreciate it!

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector Cal(NumericVector Array1,NumericVector Array2,Function wilc) {

  NumericVector vecArray1(Array1);
  IntegerVector arrayDims1 = vecArray1.attr("dim");

  NumericVector vecArray2(Array2);
  IntegerVector arrayDims2 = vecArray2.attr("dim");

  arma::cube cubeArray1(vecArray1.begin(), arrayDims1[0], arrayDims1[1],      arrayDims1[2], false);

  arma::cube cubeArray2(vecArray2.begin(), arrayDims2[0], arrayDims2[1], arrayDims2[2], false);

  arma::mat STORE=arma::mat(arrayDims1[0], arrayDims1[1]); 

  for(int i=0;i<arrayDims1[1];i++)
  { 

    for(int j=0;j<arrayDims1[0];j++){
      arma::vec       v_cl=cubeArray1.subcube(arma::span(j),arma::span(i),arma::span::all);

      //arma::mat      tem=cubeArray2.subcube(arma::span(j),arma::span::all,arma::span::all);

      //arma::vec v_ct=arma::vectorise(tem);

      arma::vec   v_ct=arma::vectorise(cubeArray2.subcube(arma::span(j),arma::span::all,arma::span::all));

      Rcpp::List resu=wilc(v_cl,v_ct);
      STORE(j,i)=resu[2];

    }

  }


  return(Rcpp::wrap(STORE));

}

The function wilc will be wilcox.test from R.

The following is part of my R code for implementing the above idea, where CELLS and CTRLS are two 3D array in R.

for(i in 1:ncol(CELLS)) {
  if(T){ print(i) }
  for (j in 1:dim(CELLS)[1]) {
    wtest = wilcox.test(CELLS[j,i,], CTRLS[j,,])
    TSTAT_clcl[j,i] = wtest$p.value
  }
}

Calling R from C++ inside each loop is ... pretty much the same as writing in R. You had the wrong starting assumption: the loops you rewrote were not the bottleneck. Next time, maybe profile first. — Dirk Eddelbuettel, Feb 17 '17 at 12:09
Hi @DirkEddelbuettel, thank you for your comment! I rewrote it in R using foreach and use doSNOW to implement code parallel. Now given a 1000*100*40 array, it takes 18 minutes to get a result. However it is still not ideal. It is very challenge for someone who lacks computer science knowledge like me to try to improve the performance of the code. Anyway, I find it also interesting! — Song Tang, Feb 17 '17 at 13:48

coatless · Accepted Answer · 2017-02-17T08:34:12.270

1

Then, I wrote it into Rcpp. The calculation within Rcpp took an hour to achieve the same results. I thought it should be faster since it is written in C++ language.

The required disclaimer:

Embedding R code in C++ and expecting a speed up is a fool's game. You will need to rewrite wilcox.test full in C++ instead of making a call to R. Otherwise, you lose whatever speed up advantage you get.

In particular, I wrote up a post illustrating this conundrum regarding the using the diff function in R. Within the post, I detailed comparing a pure C++ implementation, an C++ implementation using an R function within the routine, and a pure R implementation. Stealing the microbenchmark illustrates the above issue.

expr        min     lq      mean        median  uq      max         neval
arma_fun    26.117  27.318  37.54248    28.218  29.869  751.087     100
r_fun       127.883 134.187 212.81091   138.390 151.148 1012.856    100
rcpp_fun    250.663 265.972 356.10870   274.228 293.590 1430.426    100

Thus, a pure C++ implementation had the largest speed up.

Hence, the take away is the need to translate the wilcox.test R routine code to a pure C++ implementation to drop the run time. Otherwise, it is meaningless to write the code in C++ because the C++ component must stop and await results from R before continuing. This traditionally has a lot of overhead to ensure the data is well protected.

edited Feb 17 '17 at 08:34

answered Feb 17 '17 at 08:28

coatless

20,011
13
69
84

Thank you so much! I found a Wilcoxon signed-rank test implemented in C++ [ALGLIB](http://www.alglib.net/hypothesistesting/wilcoxonsignedrank.php). May I have your opinion about this? Does it worth a try? I appreciate it. – Song Tang Feb 17 '17 at 08:42
There isn't presently a wrapper to use ALGLIB within the _R_ ecosystem. (c.f. http://stackoverflow.com/questions/27775923/is-it-possible-to-use-alglib-with-rcpp) You will likely have to implement it yourself. – coatless Feb 17 '17 at 08:44
That's a bad news. Anyway thank you very much! It is my first time asking question on this website, I did not expect I could get feedback so quickly, thank you @coatless ! – Song Tang Feb 17 '17 at 09:00
@SongTang Seems that you are a lucky guy and somebody else already did the work of reimplementation, check this out: https://github.com/stenver/wilcoxon-test. – NoBackingDown Feb 18 '17 at 20:06
Hi @Dominik, thank you for your suggestion! Unfortunately that code is implemented under Linux system. I am currently using Windows. Anyway, Rcpp is really a powerful tool! – Song Tang Feb 18 '17 at 20:40
@SongTang This is unfortunate. You might then consider to re-implement it yourself, the algorithm seems to be not too difficult https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test. – NoBackingDown Feb 18 '17 at 20:55

Rcpp programming efficiency

1 Answers1

Linked