21

I have two matrices in R that I want to multiply:

a = matrix(rnorm(20*10000, mean=0, sd=5), 20, 10000)
b = matrix(rnorm(20*10000, mean=0, sd=5), 20, 10000)
t(a)%*%b

Given that the dimension in larger this matrix multiplication takes a lot of time, is there a specific way to make computations faster ? And are there any build in functions in R to make such multiplications faster?

Jaap
  • 81,064
  • 34
  • 182
  • 193
raK1
  • 519
  • 1
  • 4
  • 13
  • 4
    You can try using RevolutionR which is an enhanced distribution of R. Can be found here: https://mran.revolutionanalytics.com/open/ – Raad Mar 10 '16 at 18:07

3 Answers3

36

There are many ways to approach this depending upon your code, effort, and hardware.

  1. Use the 'best' function for the job

The simplest is to use crossprod which is the same as t(a)%*% b (Note - this will only be a small increase in speed)

crossprod(a,b) 
  1. Use Rcpp (and likely RcppEigen/RcppArmadillo).

C++ will likely greater increase the speed of your code. Using linear algebra libraries will likely also help this further (hence Eigen and Armadillo). This however assumes you are willing to write some C++.

  1. Use a better backend

After this point you are looking at BLAS backends such as OpenBLAS, Atlas, etc. Hooking these up to R varies depending upon your OS. It is quite easy if you are using a Debian system like Ubuntu. You can find a demo here. These can sometimes be leveraged further by libraries such as Armadillo and Eigen.

  1. GPU Computing

If you have a GPU (e.g. AMD, NVIDIA, etc.) you can leverage the many cores within to greatly speed up your computations. There are a few that could be useful including gpuR, gputools, and gmatrix

EDIT - to address @jenesaiquoi comment on benefit of Rcpp

test.cpp

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]

#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP armaMatMult(arma::mat A, arma::mat B){
    arma::mat C = A * B;

    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, Eigen::MatrixXd B){
    Eigen::MatrixXd C = A * B;

    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
    Eigen::MatrixXd C = A * B;

    return Rcpp::wrap(C);
}

test.R

library(Rcpp)

A <- matrix(rnorm(10000), 100, 100)
B <- matrix(rnorm(10000), 100, 100)

library(microbenchmark)
sourceCpp("test.cpp")
microbenchmark(A%*%B, armaMatMult(A, B), eigenMatMult(A, B), eigenMapMatMult(A, B))

Unit: microseconds
                  expr     min       lq     mean   median       uq      max neval
               A %*% B 885.846 892.1035 933.7457 901.1010 938.9255 1411.647   100
     armaMatMult(A, B) 846.688 857.6320 915.0717 866.2265 893.7790 1421.557   100
    eigenMatMult(A, B) 205.978 208.1295 233.1882 217.0310 229.4730  369.369   100
 eigenMapMatMult(A, B) 192.366 194.9835 207.1035 197.5405 205.2550  366.945   100
cdeterman
  • 19,630
  • 7
  • 76
  • 100
  • 3
    I used microbenchmark to compare against `crossprod`, not much of a speed gain at all unfortunately. – Raad Mar 10 '16 at 18:09
  • @NBATrends you are correct, it isn't a major speed gain but it is the next logical step up. – cdeterman Mar 10 '16 at 18:10
  • shoudl I have C++ installed to perform these operations ? I tried to do teh code in r and it gave me an error – raK1 Mar 10 '16 at 18:31
  • 1
    @kon7 if you are using Windows you will need [Rtools](https://cran.r-project.org/bin/windows/Rtools/) installed. Otherwise Rcpp, RcppArmadillo, RcppEigen should be sufficient. – cdeterman Mar 10 '16 at 18:32
  • I have installed R tools and still the code is not working, what are the exact steps I should follow here to try teh code on my windows – raK1 Mar 10 '16 at 20:09
  • @cdeterman if I just copy paste the above into R studion and run the 3 libararies (Rcpp,RcppEigen and RcppArmadillo). I will get Error: unexpected symbol in "armaMatMult(arma::mat A".... – raK1 Mar 10 '16 at 21:52
  • @jenesaisquoi please can you provide me with the steps to make the above code work . Shoudl I write the test.cpp in C++ ? – raK1 Mar 10 '16 at 22:48
  • @kon7 yes, test.cpp is in c++ – cdeterman Mar 11 '16 at 12:15
  • @cdeterman I tried that however i am getting an error ? should both files be in the same directory ? how can I link Rcpp with C++ function ? – raK1 Mar 11 '16 at 18:09
  • Put the full path in for `sourceCpp` – cdeterman Mar 11 '16 at 18:26
  • Is `eigenMapMatMult` always better, or does it depend on the circumstance? In my case I'm getting a ~8x speedup, which is great. (Thanks!) – generic_user Oct 07 '17 at 05:02
  • @generic_user of the provided methods, I'm pretty sure it will always be faster as it avoids a copy. The only exception may be with a different linear algebra back end like OpenBLAS. Then the armadillo may make up some time but that would require more benchmarks on your specific machine. – cdeterman Oct 09 '17 at 13:20
  • @cdeterman thanks for this script - it's really useful for some benchmarking I've been doing. Interestingly, using the Intel MKL linear algebra back end (on Windows, with Microsoft Open R) completely inverts the results - A%*% B becomes much the quickest option. Any idea why that should be? I can understand it improving - but not ending up outright faster? All options seem to be using multiple cores - even on "normal" R - so I guess it's not changes there. – Mooks Jan 26 '18 at 22:10
  • @Mooks I'm not terribly surprised by MKL winning out. It is a highly optimized math library that Microsoft (Revolution R) worked hard at integrated well with R. I am not familiar with the details but I suspect it is likely taking advantage of similar aspects as the 'mapped' eigen structures as well in addition to parallelization. – cdeterman Jan 29 '18 at 15:03
  • All of the GPU tools you mentioned have been removed from CRAN e.g. https://cran.r-project.org/web/packages/gmatrix/index.html – Z boson May 27 '19 at 07:38
  • @Zboson I am currently working on getting gpuR back up again. I should omit the others. – cdeterman May 28 '19 at 14:58
6

To add to cdeterman's answer: You can use eigen's build in parallelization for dense matrix products. In order to do so, you need to compile with open mp activated.

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]
// [[Rcpp::plugins(openmp)]]

#include <omp.h>
#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP armaMatMult(arma::mat A, arma::mat B){
  arma::mat C = A * B;
  
  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, 
                  Eigen::MatrixXd B, 
                  int n_cores){
  
  Eigen::setNbThreads(n_cores);
  //qDebug()  << Eigen::nbThreads( );
  Eigen::MatrixXd C = A * B;
  
  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult2(const Eigen::Map<Eigen::MatrixXd> A,
                      Eigen::Map<Eigen::MatrixXd> B, 
                      int n_cores){
  
  Eigen::setNbThreads(n_cores);
  Eigen::MatrixXd C = A * B;
  return Rcpp::wrap(C);
}

Here are some benchmarks:

Note that if N = k = 100, parallelization does not necessarily improve performance. If the matrix dimensions get larger, parallelization starts to have an impact (N = k = 1000):

library(microbenchmark)

# Benchmark 1: N = k = 100

N <- 100
k <- 100

A <- matrix(rnorm(N*k), N, k)
B <- matrix(rnorm(N*k), k, N)

microbenchmark(A%*%B, 
               armaMatMult2(A, B),
               eigenMatMult2(A, B, n_cores = 1),
               eigenMatMult2(A, B, n_cores = 2),
               eigenMatMult2(A, B, n_cores = 4),
               eigenMapMatMult2(A, B, n_cores = 1),
               eigenMapMatMult2(A, B, n_cores = 2),
               eigenMapMatMult2(A, B, n_cores = 4), 
               times = 100

# Unit: microseconds
#                                 expr   min     lq    mean median     uq   max neval
#                              A %*% B 535.6 540.75 552.594 551.25 554.50 650.2   100
#                   armaMatMult2(A, B) 542.0 549.10 560.975 556.35 560.25 738.1   100
#     eigenMatMult2(A, B, n_cores = 1) 147.1 152.65 159.165 159.65 162.90 180.5   100
#     eigenMatMult2(A, B, n_cores = 2)  97.1 109.90 124.496 119.60 127.50 391.8   100
#     eigenMatMult2(A, B, n_cores = 4)  71.7  88.15 155.220 115.55 216.95 507.3   100
#  eigenMapMatMult2(A, B, n_cores = 1) 139.1 150.10 154.889 154.20 158.35 244.3   100
#  eigenMapMatMult2(A, B, n_cores = 2)  93.4 105.70 116.808 113.55 120.40 323.7   100
#  eigenMapMatMult2(A, B, n_cores = 4)  66.8  82.60 161.516 196.25 210.40 598.9   100
)

# Benchmark 2: N = k = 1000

N <- 1000
k <- 1000

A <- matrix(rnorm(N*k), N, k)
B <- matrix(rnorm(N*k), k, N)

microbenchmark(A%*%B, 
                armaMatMult2(A, B),
                eigenMatMult2(A, B, n_cores = 1),
                eigenMatMult2(A, B, n_cores = 2),
                eigenMatMult2(A, B, n_cores = 4),
               eigenMapMatMult2(A, B, n_cores = 1),
               eigenMapMatMult2(A, B, n_cores = 2),
               eigenMapMatMult2(A, B, n_cores = 4), 
               times = 100
)


Unit: milliseconds
                                expr      min        lq      mean    median        uq
                             A %*% B 597.1293 605.56840 814.52389 665.86650 1025.5896
                  armaMatMult2(A, B) 603.3894 620.25675 830.98947 693.22355 1078.4853
    eigenMatMult2(A, B, n_cores = 1) 131.4696 135.22475 186.69826 193.37870  219.8727
    eigenMatMult2(A, B, n_cores = 2)  67.8948  71.71355 114.52759  74.17380  173.3060
    eigenMatMult2(A, B, n_cores = 4)  41.8564  48.87075  79.55535  72.00705  106.8572
 eigenMapMatMult2(A, B, n_cores = 1) 125.3890 129.26125 175.09933 177.23655  213.0536
 eigenMapMatMult2(A, B, n_cores = 2)  62.2866  65.78785 115.74248  79.92470  167.0217
 eigenMapMatMult2(A, B, n_cores = 4)  35.2977  40.42480  68.21669  63.13655   97.2571
       max neval
 1217.6475   100
 1446.5127   100
  419.2043   100
  217.9513   100
  139.9629   100
  298.2859   100
  230.6307   100
  118.2553   100
A.Fischer
  • 596
  • 5
  • 11
-1

Rcpp codes:

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]
// [[Rcpp::plugins(openmp)]]

#include <omp.h>
#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP armaMatMult(arma::mat A, arma::mat B){
  arma::mat C = A * B;
  
  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, 
                  Eigen::MatrixXd B, 
                  int n_cores){
  
  Eigen::setNbThreads(n_cores);
  //qDebug()  << Eigen::nbThreads( );
  Eigen::MatrixXd C = A * B;
  
  return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult2(const Eigen::Map<Eigen::MatrixXd> A,
                      Eigen::Map<Eigen::MatrixXd> B, 
                      int n_cores){
  
  Eigen::setNbThreads(n_cores);
  Eigen::MatrixXd C = A * B;
  return Rcpp::wrap(C);
}

R codes:

library(microbenchmark)

# Benchmark 1: N = k = 100

N <- 1000
k <- 1000

A <- matrix(rnorm(N*k), N, k)
B <- matrix(rnorm(N*k), k, N)

microbenchmark(A%*%B, 
               crossprod(A,B),
               armaMatMult(A, B),
               eigenMatMult(A, B, n_cores = 1),
               eigenMatMult(A, B, n_cores = 2),
               eigenMatMult(A, B, n_cores = 4),
               eigenMapMatMult2(A, B, n_cores = 1),
               eigenMapMatMult2(A, B, n_cores = 2),
               eigenMapMatMult2(A, B, n_cores = 4), 
               times = 100)

why crossprod() is the fastest way in my server?

Unit: milliseconds
                            expr       min        lq     mean   median       uq       max neval
                            A %*% B  9.630260  9.943458 11.54563 10.10754 11.04614 118.87243   100
                    crossprod(A, B)  9.668597  9.846900 10.54206 10.07100 11.05562  18.93269   100
                  armaMatMult(A, B) 12.460832 13.250503 19.07578 20.17903 23.33987  30.40554   100
    eigenMatMult(A, B, n_cores = 1) 57.248345 58.737071 60.26937 60.08885 61.47355  71.58064   100
    eigenMatMult(A, B, n_cores = 2) 31.465149 32.908495 34.27428 34.33102 35.28012  38.14678   100
    eigenMatMult(A, B, n_cores = 4) 18.724495 19.509954 21.49996 20.44093 22.34533  38.44640   100
eigenMapMatMult2(A, B, n_cores = 1) 54.040737 55.628182 57.16926 57.13763 58.41168  67.12156   100
eigenMapMatMult2(A, B, n_cores = 2) 28.965028 30.021924 31.26321 30.87678 32.45385  35.07308   100
eigenMapMatMult2(A, B, n_cores = 4) 16.208926 16.830164 18.32303 17.30694 19.64176  33.19359   100

I am using Microsoft R 4.0. I guess Microsoft optimized the matrix operation.

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /opt/microsoft/ropen/4.0.2/lib64/R/lib/libRblas.so    
LAPACK: /opt/microsoft/ropen/4.0.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=zh_CN.UTF-8       LC_NUMERIC=C               LC_TIME=zh_CN.UTF-8            LC_COLLATE=zh_CN.UTF-8    
 [5] LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=zh_CN.UTF-8    LC_PAPER=zh_CN.UTF-8           LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.4-7    miraculix_1.0.5         RandomFieldsUtils_1.0.6 RevoUtils_11.0.2        RevoUtilsMath_11.0.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7               lattice_0.20-44          digest_0.6.27            grid_4.0.2              
 [5] evaluate_0.14            rlang_0.4.11             RcppArmadillo_0.10.6.0.0 Matrix_1.3-4            
 [9] rmarkdown_2.9            tools_4.0.2              RcppEigen_0.3.3.7.0      tinytex_0.32            
[13] nadiv_2.17.1             parallel_4.0.2           xfun_0.24                yaml_2.2.1              
[17] compiler_4.0.2           htmltools_0.5.1.1        knitr_1.33              
  • Please add further details to expand on your answer, such as working code or documentation citations. – Community Sep 01 '21 at 08:55
  • If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. – Rosalie Bruel Sep 01 '21 at 12:34
  • If you have a new question, please ask it by clicking the [Ask Question](https://stackoverflow.com/questions/ask) button. Include a link to this question if it helps provide context. – Rosalie Bruel Sep 01 '21 at 12:35