I have calculating power flow across a network. However, I have found that for larger networks the calculation is slow. I tried to implement the algorithm using RcppArmadillo. The Rcpp function is several times faster for small networks/matrices but becomes just as slow for larger ones. The speed of the operation is affecting the usefulness of functions further on. Am I writing the functions badly or is this just the options I have?
The R code below gives and example of the executions times for a matrix of 10, 100, 1000.
library(igraph); library(RcppArmadillo)
#Create the basic R implementation of the equation
flowCalc <- function(A,C){
B <- t(A) %*% C %*% A
Imp <- solve(B)
PTDF <- C %*% A %*% Imp
Out <- list(Imp = Imp, PTDF= PTDF)
return(Out)
}
#Create the c++ implementation
txt <- 'arma::mat Am = Rcpp::as< arma::mat >(A);
arma::mat Cm = Rcpp::as< arma::mat >(C);
arma::mat B = inv(trans(Am) * Cm * Am);
arma::mat PTDF = Cm * Am * B;
return Rcpp::List::create( Rcpp::Named("Imp") = B ,
Rcpp::Named("PTDF") = PTDF ) ; '
flowCalcCpp <- cxxfunction(signature(A="numeric",
C="numeric"),
body=txt,
plugin="RcppArmadillo")
#Create a function to generate dummy data
MakeData <- function(n, edgep){#make an erdos renyi graph of size x
g <- erdos.renyi.game(n, edgep)
#convert to adjacency matrix
Adjmat <- as_adjacency_matrix(g, sparse = F)
#create random graph and mask the elements with not edge
Cmat <- matrix(rnorm(n*n), ncol = n)*Adjmat
##Output a list of the two matrices
list(A = Adjmat, C = Cmat)
}
#generate dummy data
set.seed(133)
Data10 <- MakeData(10, 1/1)
Data100 <- MakeData(100, 1/10)
Data1000 <- MakeData(1000, 1/100)
#Compare results
BenchData <- microbenchmark(
R10 = flowCalc(Data10$A, Data10$C),
R100 = flowCalc(Data100$A, Data100$C),
R1000 = flowCalc(Data1000$A, Data1000$C),
Cpp10 = flowCalcCpp(Data10$A, Data10$C),
Cpp100 = flowCalcCpp(Data100$A, Data100$C),
Cpp1000 = flowCalcCpp(Data1000$A, Data1000$C))
The speed results are shown below.
EDIT:
After reading Ralf's answer and Dirk's comments I used https://cran.r-project.org/web/packages/gcbd/vignettes/gcbd.pdf to get a better overview of the differences in BLAS implementations
I then used Dirk's guide to installing the microsoft BLAS implementation https://github.com/eddelbuettel/mkl4deb (clearly, I now live in the Eddelverse)
Once I had done that I installed ArrayFire and RcppArrayFire as Ralf suggested. I then ran the code and got the following results
Unit: microseconds
expr min lq mean median uq max neval
R10 37.348 83.2770 389.6690 143.9530 181.8625 25022.315 100
R100 464.148 587.9130 1187.3686 680.8650 849.0220 32602.678 100
R1000 143065.901 160290.2290 185946.5887 191150.2335 201894.5840 464179.919 100
Cpp10 11.946 30.6120 194.8566 55.6825 74.0535 13732.984 100
Cpp100 357.880 452.7815 987.8059 496.9520 554.5410 39359.877 100
Cpp1000 102949.722 124813.9535 136898.4688 132852.9335 142645.6450 214611.656 100
AF10 713.543 833.6270 1135.0745 966.5920 1057.4175 8177.957 100
AF100 2809.498 3152.5785 3662.5323 3313.0315 3569.7785 12581.535 100
AF1000 77905.179 81429.2990 127087.2049 82579.6365 87249.0825 3834280.133 100
The speed is reduced for the smaller matrices, twice as fast for the matrices in the order of 100, but almost 10 times faster for the larger matrices, which is consistant with Ralf's results. The difference with using C++ also grew.
On my data the BLAS upgrade it is worth using the C++ version but possibly not the Arrayfire version, as my matrices aren't big enough.