1

I have to perform many matrix calculations to perform load flow calculations. I'm using R in general and C++ for computationally expensive steps. The the most computationally expensive parts are two matrix multiplications + transpose, and one time solving the equation Ax = b. I started using sparse matrices but because I managed to reduce the size of the matrices this doesn't seem the way to go anymore. Switching to dense matrices saved me the cost of building sparse matrices and the solve-step is much faster (~20 times). The multiplications are a bit slower but I ended up with a factor 3 to 6 of speed increase compared to sparse matrices.

To increase the speed further I tried the methods suggested at fast large matrix multiplication in R

I included an extra part of C++ code that performs all my calculations at once.

// [[Rcpp::export]]
SEXP eigenMapNodeAdm(const Eigen::Map<Eigen::MatrixXd> I, Eigen::Map<Eigen::MatrixXd> A){
  Eigen::MatrixXd Y = I * A * I.transpose() ;

  return Rcpp::wrap(Y);
}

For smaller matrices (~120 rows/columns and smaller) this turns out to be much faster than the standard R matrix multiplication and transpose. However for larger matrices the calculation times increase very dramatically. Any ideas about what causes this behaviour? Maybe there are better ways to multiply these relatively small but still quite sparse matrices? Thank you in advance for the help.

library(microbenchmark)
library(Rcpp)
library(RcppEigen)

sourceCpp("MatrixMtp.cpp") # From https://stackoverflow.com/questions/35923787/fast-large-matrix-multiplication-in-r
                           # but with own method added

A <- as.matrix(read.csv("https://pastebin.com/raw/iLBuN9jJ")) I <- as.matrix(read.csv("https://pastebin.com/raw/VNcicPkc"))

microbenchmark(I %*% A %*% Matrix::t(I),
               I %*% tcrossprod(A, I),
               armaMatMult(armaMatMult(I, A),Matrix::t(I)),
               eigenMatMult(eigenMatMult(I, A),Matrix::t(I)),
               eigenMapMatMult(eigenMapMatMult(I, A), Matrix::t(I)),
               eigenMapNodeAdm(I,A),
               times=100, check = "equivalent")

Unit: microseconds
                                                 expr       min         lq      mean    median        uq       max neval
                             I %*% A %*% Matrix::t(I)   743.334   960.5095  1802.524  1187.911  1459.810 35053.930   100
                               I %*% tcrossprod(A, I)   671.096   911.5160  1202.167  1163.660  1369.427  2052.610   100
         armaMatMult(armaMatMult(I, A), Matrix::t(I))   820.778  1345.8945  1996.141  1639.704  2437.173  4237.511   100
       eigenMatMult(eigenMatMult(I, A), Matrix::t(I)) 35358.135 43384.3715 49053.363 47713.833 53511.026 76970.653   100
 eigenMapMatMult(eigenMapMatMult(I, A), Matrix::t(I)) 33974.205 44098.6550 49799.859 49723.058 54936.584 76038.817   100
                                eigenMapNodeAdm(I, A) 30565.759 44404.1400 49241.233 48834.046 53759.213 72884.721   100

If I subset the matrices the Eigen implemenations are much faster:

microbenchmark(Isub %*% Asub %*% Matrix::t(Isub),
               Isub %*% tcrossprod(Asub, Isub),
               armaMatMult(armaMatMult(Isub, Asub),Matrix::t(Isub)),
               eigenMatMult(eigenMatMult(Isub, Asub),Matrix::t(Isub)),
               eigenMapMatMult(eigenMapMatMult(Isub, Asub), Matrix::t(Isub)),
               eigenMapNodeAdm(Isub, Asub),times=1000, check = "equivalent")
Unit: microseconds
                                                      expr     min       lq     mean   median       uq       max neval
                               Isub %*% Asub %*% Matrix::t(Isub) 341.197 376.6945 556.0409 519.2540 535.0250 27024.769  1000
                                 Isub %*% tcrossprod(Asub, Isub) 294.895 346.8450 456.7293 471.9845 486.3635 15247.886  1000
           armaMatMult(armaMatMult(Isub, Asub), Matrix::t(Isub)) 362.011 417.8405 566.1201 521.3710 569.3060 12531.499  1000
         eigenMatMult(eigenMatMult(Isub, Asub), Matrix::t(Isub)) 322.220 440.7020 532.7970 475.2170 519.2595 14328.881  1000
   eigenMapMatMult(eigenMapMatMult(Isub, Asub), Matrix::t(Isub)) 263.116 376.5190 502.5597 407.7840 450.3955 20956.507  1000
                                     eigenMapNodeAdm(Isub, Asub) 197.265 291.4280 330.9622 306.5935 320.9700  7650.685  1000

Content of the MatrixMtp.cpp file:

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]

#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP armaMatMult(arma::mat A, arma::mat B){
  arma::mat C = A * B;

  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, Eigen::MatrixXd B){
  Eigen::MatrixXd C = A * B;

  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
  Eigen::MatrixXd C = A * B;

  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapNodeAdm(const Eigen::Map<Eigen::MatrixXd> I, Eigen::Map<Eigen::MatrixXd> A){
  Eigen::MatrixXd Y = I * A * I.transpose() ;

  return Rcpp::wrap(Y);
}

Verbose output of sourceCpp:

> sourceCpp("inst/extdata/MatrixMtp.cpp", verbose = TRUE) # call the C++ file and we have three functions as armaMatMult,eigenMatMult,eigenMapMatMult.

Generated extern "C" functions 
--------------------------------------------------------


#include <Rcpp.h>
// armaMatMult
SEXP armaMatMult(arma::mat A, arma::mat B);
RcppExport SEXP sourceCpp_1_armaMatMult(SEXP ASEXP, SEXP BSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< arma::mat >::type A(ASEXP);
    Rcpp::traits::input_parameter< arma::mat >::type B(BSEXP);
    rcpp_result_gen = Rcpp::wrap(armaMatMult(A, B));
    return rcpp_result_gen;
END_RCPP
}
// eigenMatMult
SEXP eigenMatMult(Eigen::MatrixXd A, Eigen::MatrixXd B);
RcppExport SEXP sourceCpp_1_eigenMatMult(SEXP ASEXP, SEXP BSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< Eigen::MatrixXd >::type A(ASEXP);
    Rcpp::traits::input_parameter< Eigen::MatrixXd >::type B(BSEXP);
    rcpp_result_gen = Rcpp::wrap(eigenMatMult(A, B));
    return rcpp_result_gen;
END_RCPP
}
// eigenMapMatMult
SEXP eigenMapMatMult(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B);
RcppExport SEXP sourceCpp_1_eigenMapMatMult(SEXP ASEXP, SEXP BSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const Eigen::Map<Eigen::MatrixXd> >::type A(ASEXP);
    Rcpp::traits::input_parameter< Eigen::Map<Eigen::MatrixXd> >::type B(BSEXP);
    rcpp_result_gen = Rcpp::wrap(eigenMapMatMult(A, B));
    return rcpp_result_gen;
END_RCPP
}
// eigenMapNodeAdm
SEXP eigenMapNodeAdm(const Eigen::Map<Eigen::MatrixXd> I, Eigen::Map<Eigen::MatrixXd> A);
RcppExport SEXP sourceCpp_1_eigenMapNodeAdm(SEXP ISEXP, SEXP ASEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const Eigen::Map<Eigen::MatrixXd> >::type I(ISEXP);
    Rcpp::traits::input_parameter< Eigen::Map<Eigen::MatrixXd> >::type A(ASEXP);
    rcpp_result_gen = Rcpp::wrap(eigenMapNodeAdm(I, A));
    return rcpp_result_gen;
END_RCPP
}
// eigenMapCalcCurrents
SEXP eigenMapCalcCurrents(const Eigen::Map<Eigen::MatrixXd> I, Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> U);
RcppExport SEXP sourceCpp_1_eigenMapCalcCurrents(SEXP ISEXP, SEXP ASEXP, SEXP USEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const Eigen::Map<Eigen::MatrixXd> >::type I(ISEXP);
    Rcpp::traits::input_parameter< Eigen::Map<Eigen::MatrixXd> >::type A(ASEXP);
    Rcpp::traits::input_parameter< Eigen::Map<Eigen::MatrixXd> >::type U(USEXP);
    rcpp_result_gen = Rcpp::wrap(eigenMapCalcCurrents(I, A, U));
    return rcpp_result_gen;
END_RCPP
}

Generated R functions 
-------------------------------------------------------

`.sourceCpp_1_DLLInfo` <- dyn.load('/data/tmp/R/RtmpPJH0wY/sourceCpp-x86_64-redhat-linux-gnu-1.0.2.3/sourcecpp_620a5c2e111b/sourceCpp_8.so')

armaMatMult <- Rcpp:::sourceCppFunction(function(A, B) {}, FALSE, `.sourceCpp_1_DLLInfo`, 'sourceCpp_1_armaMatMult')
eigenMatMult <- Rcpp:::sourceCppFunction(function(A, B) {}, FALSE, `.sourceCpp_1_DLLInfo`, 'sourceCpp_1_eigenMatMult')
eigenMapMatMult <- Rcpp:::sourceCppFunction(function(A, B) {}, FALSE, `.sourceCpp_1_DLLInfo`, 'sourceCpp_1_eigenMapMatMult')
eigenMapNodeAdm <- Rcpp:::sourceCppFunction(function(I, A) {}, FALSE, `.sourceCpp_1_DLLInfo`, 'sourceCpp_1_eigenMapNodeAdm')
eigenMapCalcCurrents <- Rcpp:::sourceCppFunction(function(I, A, U) {}, FALSE, `.sourceCpp_1_DLLInfo`, 'sourceCpp_1_eigenMapCalcCurrents')

rm(`.sourceCpp_1_DLLInfo`)

Building shared library
--------------------------------------------------------

DIR: /data/tmp/R/RtmpPJH0wY/sourceCpp-x86_64-redhat-linux-gnu-1.0.2.3/sourcecpp_620a5c2e111b

/usr/lib64/R/bin/R CMD SHLIB -o 'sourceCpp_8.so'  'MatrixMtp.cpp'  
g++ -m64 -std=gnu++11 -I"/usr/include/R" -DNDEBUG -I../inst/include -fopenmp  -I"/data/users/al5696/Rlib/Rcpp/include" -I"/data/common/R/library/RcppArmadillo/include" -I"/data/common/R/library/RcppEigen/include" -I"/data/users/al5696/ms_n_min_1_berekening3/inst/extdata" -I/usr/local/include   -fpic  -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches   -m64 -mtune=generic -c MatrixMtp.cpp -o MatrixMtp.o
g++ -m64 -std=gnu++11 -shared -L/usr/lib64/R/lib -Wl,-z,relro -o sourceCpp_8.so MatrixMtp.o -fopenmp -L/usr/lib64/R/lib -lRlapack -L/usr/lib64/R/lib -lRblas -lgfortran -lm -lquadmath -L/usr/lib64/R/lib -lR
Jacco
  • 87
  • 1
  • 11
  • Make sure your cpp files are properly compiled with compiler optimizations ON (`-O3`) and full support of your CPU (`-march=native`) – ggael Nov 13 '19 at 20:24
  • Did you tell Eigen to use your BLAS library? – DaBookshah Nov 14 '19 at 04:37
  • @ggael Would this probably be the cause of this issue? Currently my compiler uses O2, but I'm not allowed to change the Makeconf file on the server I'm working on (https://stackoverflow.com/questions/48917107/rcpp-sourcecpp-compiler-settings). Would it be possible to compile this function by hand and use it in R? – Jacco Nov 15 '19 at 20:28
  • @DaBookshah How can I check that? I included the verbose output of sourceCpp, that shows the full compiler settings. – Jacco Nov 15 '19 at 20:33
  • Yeah, looks like you didn't. Organise to add EIGEN_USE_BLAS as a define. This is consistent with your performance description. – DaBookshah Nov 17 '19 at 01:57

0 Answers0