2

I am trying to improve my loop computation speed by using foreach, but there is a simple Rcpp function I defined inside of this loop. I saved the Rcpp function as mproduct.cpp, and I call out the function simply using

sourceCpp("mproduct.cpp")

and the Rcpp function is a simple one, which is to perform matrix product in C++:

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]

#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
  Eigen::MatrixXd C = A * B;
  return Rcpp::wrap(C);
}

So, the function in the Rcpp file is MP, referring to matrix product. I need to perform the following foreach loop (I have simplified the code for illustration):

foreach(j=1:n, .package='Rcpp',.noexport= c("mproduct.cpp"),.combine=rbind)%dopar%{
n=1000000
A<-matrix(rnorm(n,1000,1000))
B<-matrix(rnorm(n,1000,1000))
S<-MP(A,B)
return(S)
} 

Since the size of matrix A and B are large, it is why I want to use foreach to alleviate the computational cost.

However, the above code does not work, since it provides me error message:

task 1 failed - "NULL value passed as symbol address"

The reason I added .noexport= c("mproduct.cpp") is to follow some suggestions from people who solved similar issues (Can't run Rcpp function in foreach - "NULL value passed as symbol address"). But somehow this does not solve my issue.

So I tried to install my Rcpp function as a library. I used the following code:

Rcpp.package.skeleton('mp',cpp_files = "<my working directory>")

but it returns me a warning message:

The following packages are referenced using Rcpp::depends attributes however are not listed in the Depends, Imports or LinkingTo fields of the package DESCRIPTION file: RcppArmadillo, RcppEigen 

so when I tried to install my package using

install.packages("<my working directory>",repos = NULL,type='source')

I got the warning message:

Error in untar2(tarfile, files, list, exdir, restore_times) : 
  incomplete block on file
In R CMD INSTALL
Warning in install.packages :
  installation of package ‘C:/Users/Lenovo/Documents/mproduct.cpp’ had non-zero exit status

So can someone help me out how to solve 1) using foreach with Rcpp function MP, or 2) install the Rcpp file as a package?

Thank you all very much.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
Rico
  • 33
  • 3
  • 1
    Is the Rcpp code a toy example or your real code? I doubt that you can gain a lot above `%*%` by using RcppEigen (or RcppArmadillo). However, when you need to make use of one of these packages, you should decide which one you want to use (Your current code does not use Armadillo at all ...). After that, use the `*.package.skeleton()` function from *that package*. – Ralf Stubner Oct 20 '19 at 10:31
  • Welcome to StackOverflow, Rico. Ralf is spot on: the other packages have skeleton generators too which you should use. – Dirk Eddelbuettel Oct 20 '19 at 11:46
  • @RalfStubner Thank you for your comment. The Rcpp code is my real code and the `foreach` code is my toy code, illustrating the fact that I am trying to perform the `foreach` loop when a Rcpp function lies inside. I am sorry but still didn't get your two points: 1) how should I improve `Eigen::MatrixXd C = A * B` using RcppEigen? and 2) how should I revise my `Rcpp` code so that I can only use RcppEigen package? – Rico Oct 20 '19 at 11:59
  • @DirkEddelbuettel Thank you. But I am pretty new for the Rcpp and all I need to have is to make the matrix product AB to be as faster as possible. I am not sure which packages have skeleton generators. – Rico Oct 20 '19 at 12:03
  • 1
    Rico: Please read (at least) the [Rcpp Introduction](https://cloud.r-project.org/web/packages/Rcpp/vignettes/Rcpp-introduction.pdf) vignette; it will clear a lot of things up. It also mentions RcppArmadillo a bit. Ralf and I pointed you the function `RcppArmadillo.package.skeleton()` in that package (or symmetrically in RcppEigen). Lastly, we are trying to explain to you that R already uses the same LAPACK matrix multiplication code. Just by calling it from C++ you will gain very little to nothing -- but see for yourself with some benchmarks. – Dirk Eddelbuettel Oct 20 '19 at 12:45

1 Answers1

2

The first step would be making sure that you are optimizing the right thing. For me, this would not be the case as this simple benchmark shows:

set.seed(42)
n <- 1000
A<-matrix(rnorm(n*n), n, n)
B<-matrix(rnorm(n*n), n, n)

MP <- Rcpp::cppFunction("SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
  Eigen::MatrixXd C = A * B;
  return Rcpp::wrap(C);
}", depends = "RcppEigen")

bench::mark(MP(A, B), A %*% B)[1:5]
#> # A tibble: 2 x 5
#>   expression      min   median `itr/sec` mem_alloc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>
#> 1 MP(A, B)    277.8ms    278ms      3.60    7.63MB
#> 2 A %*% B      37.4ms     39ms     22.8     7.63MB

So for me the matrix product via %*% is several times faster than the one via RcppEigen. However, I am using Linux with OpenBLAS for matrix operations while you are on Windows, which often means reference BLAS for matrix operations. It might be that RcppEigen is faster on your system. I am not sure how difficult it is for Windows user to get a faster BLAS implementation (https://csgillespie.github.io/efficientR/set-up.html#blas-and-alternative-r-interpreters might contain some pointers), but I would suggest spending some time on investigating this.

Now if you come to the conclusion that you do need RcppEigen or RcppArmadillo in your code and want to put that code into a package, you can do the following. Instead of Rcpp::Rcpp.package.skeleton() use RcppEigen::RcppEigen.package.skeleton() or RcppArmadillo::RcppArmadillo.package.skeleton() to create a starting point for a package based on RcppEigen or RcppArmadillo, respectively.

Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
  • 1
    Nice answer. And while I do not use Windows either but it is detailed in an appendix of one of the R manuals that ship with R, likely R Installation and Administration -- and Prof Ripley has for years supported this with a custom download from his site. – Dirk Eddelbuettel Oct 20 '19 at 12:53
  • @Ralf Stubner Thank you for providing those information. That is strange as I use the same code as yours, and I have 21.33445 mins for `MP(A,B)` but 143.46149 for `A%*%B`. Maybe due to operational system difference. I will read carefully your post and revise my file. Appreciate it! – Rico Oct 20 '19 at 13:37