Can I speedup my R code with Rcpp?

Question

I defined an R function that contains a matrix, a vector and a parameter a. I need to compute the results of the function for different values of a. This is simple to code in R, but very slow when the matrix is "big" and number of the parameter values are large.

Can I define the function in R and do the for loop in Rcpp?

Can it speed up the computations?

A minimal example of a foo function in R is

f <- function(X,y,a){
  p = ncol(X)
  res = (crossprod(X) + a*diag(1,p))%*%crossprod(X,y)
  }

set.seed(0)
X <- matrix(rnorm(50*5),50,5)
y <- rnorm(50)
a <- seq(0,1,0.1)

result <- matrix(NA,ncol(X),length(a))

for(i in 1:length(a)){                  # Can I do this part in Rcpp?
  result[,i] <- f(X,y,a[i])
  }

result

So you're only interested in answers involving `Rcpp`, not any other way of speeding it up, right? — Hack-R, Jul 22 '18 at 18:30
Would this post for [fast large matrix multiplication](https://stackoverflow.com/questions/35923787/fast-large-matrix-multiplication-in-r) be of any help ? The `%*%` can be done using `crossprod` according to this post. It also talks about Rcpp. — steveb, Jul 22 '18 at 18:30
If you need matrix multiplications, you can use {RcppArmadillo}. Yet, you won't gain much from the looping part because matrix computations are the same and are likely to take most of the time in your code. — F. Privé, Jul 23 '18 at 06:09

Zheyuan Li · Answer 1 · 2018-07-24T13:42:38.150

I simply suggest avoiding recomputing X'X and X'y in the loop as they are loop-invariant.

f <- function (XtX, Xty, a) (XtX + diag(a, ncol(XtX))) %*% Xty

set.seed(0)
X <- matrix(rnorm(50*5),50,5)
y <- rnorm(50)
a <- seq(0,1,0.1)

result1 <- matrix(NA, ncol(X), length(a))

XtX <- crossprod(X)
Xty <- crossprod(X, y)

for(i in 1:length(a)) {
  result1[,i] <- f(XtX, Xty, a[i])
  }

## compare with your `result`
all.equal(result, result1)
#[1] TRUE

hours later...

When I return I see something more to simplify.

(XtX + diag(a, ncol(XtX))) %*% Xty = XtX %*% Xty + diag(a, ncol(XtX)) %*% Xty
                                   = XtX %*% Xty + a * Xty

So actually, XtX %*% Xty is loop-invariant, too.

f <- function (XtX.Xty, Xty, a) XtX.Xty + a * Xty

set.seed(0)
X <- matrix(rnorm(50*5),50,5)
y <- rnorm(50)
a <- seq(0,1,0.1)

result2 <- matrix(NA, ncol(X), length(a))

XtX <- crossprod(X)
Xty <- c(crossprod(X, y))  ## one-column matrix to vector
XtX.Xty <- c(XtX %*% Xty)  ## one-column matrix to vector

for(i in 1:length(a)) {
  result2[,i] <- f(XtX.Xty, Xty, a[i])
  }

## compare with your `result`
all.equal(result, result2)
#[1] TRUE

And it turns out that we can get rid of the loop:

## "inline" function `f`
for(i in 1:length(a)) {
  result2[,i] <- XtX.Xty + a[i] * Xty
  }

## compare with your `result`
all.equal(result, result2)
#[1] TRUE

## do it with recycling rule
for(i in 1:length(a)) {
  result2[,i] <- a[i] * Xty
  }
result2 <- XtX.Xty + result2

## compare with your `result`
all.equal(result, result2)
#[1] TRUE

## now use `tcrossprod`
result2 <- XtX.Xty + tcrossprod(Xty, a)

## compare with your `result`
all.equal(result, result2)
#[1] TRUE

I completely agree with you that your example code in the question is just a "foo". And you may not have thought carefully about it when posting it. However, it suffices to show that when writing a loop, either an R loop or a C / C++ / FORTRAN loop, we should always seek to pull those loop-invariant out of the loop to reduce computational complexity.

Your concern is to get speedup with Rcpp (or any compiled language). You want to vectorize a section of R code that is not readily vectorized. However, "%*%", crossprod and tcrossprod are mapped to BLAS (FORTRAN code) and are not R-level computations. You readily have everything vectorized.

Don't always blame the interpretation overhead (as R is an interpreted language) of an R-loop for the poor performance. Such overhead is insignificant if each iteration is doing some "heavy" computations, like big matrix computations (using BLAS) or fitting statistical models (like lm). In fact, if you do want to write such a loop with a compiled language, use lapply function. This function implements loop at C level, and calls R function for each iteration. Alternatively, Ralf's answer is an Rcpp equivalent. In my opinion, a loop nest written in R code is more likely to be inefficient.

`+1`. Thank you for your reply. But I still wonder if someone who does not know `c++`, make use of `Rcpp` for speeding up `R` loops. — mert, Jul 22 '18 at 18:44
@ 李哲源 This code is a kind of `foo` code. I just wonder if I can make use of `Rcpp` for `loops` of the functions including matrices or vectors and some parameters. Those matrix products are arbitrary. — mert, Jul 22 '18 at 18:48
@ 李哲源 Can I improve the performance for a function that is not optimized like `crossprod`? — mert, Jul 22 '18 at 18:50

Ralf Stubner · Accepted Answer · 2018-07-23T09:14:56.540

The answer by 李哲源 correctly identifies what should be done in your case. As for your original question the answer is two-fold: Yes, you can move the loop to C++ with Rcpp. No, you won't gain performance:

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericMatrix fillMatrix(Rcpp::NumericMatrix X,
                   Rcpp::NumericVector y,
                   Rcpp::NumericVector a,
                   Rcpp::Function f) {
  Rcpp::NumericMatrix result = Rcpp::no_init(X.cols(), a.length());
  for (int i = 0; i < a.length(); ++i) {
    result(Rcpp::_, i) = Rcpp::as<Rcpp::NumericVector>(f(X, y, a[i]));
  }
  return result;
}

/*** R
f <- function(X,y,a){
  p = ncol(X)
  res = (crossprod(X) + a*diag(1,p))%*%crossprod(X,y)
  }

X <- matrix(rnorm(500*50),500,50)
y <- rnorm(500)
a <- seq(0,1,0.01)

system.time(fillMatrix(X, y, a, f))
#   user  system elapsed 
#  0.052   0.077   0.075 
system.time({result <- matrix(NA,ncol(X),length(a))

for(i in 1:length(a)){
  result[,i] <- f(X,y,a[i])
  }
})
#   user  system elapsed 
#  0.060   0.037   0.049 
*/

So the Rcpp solution is actually slower than the R solution in this case. Why? Because the real work is done within the function f. This is the same for both solutions, but the Rcpp solution has the additional overhead of calling back to R from C++. Note that for loops in R are not necessarily slow. BTW, here some benchmark data:

          expr      min       lq     mean   median       uq      max neval
 fillMatrixR() 41.22305 41.86880 46.16806 45.20537 49.11250 65.03886   100
 fillMatrixC() 44.57131 44.90617 49.76092 50.99102 52.89444 66.82156   100

After I read the excellent solution by @李哲源, I reasoned that nothing could be added here to make this post better... until I read this answer. I really do appreciate your contributions. — Joseph Wood, Jul 23 '18 at 11:48

Can I speedup my R code with Rcpp?

2 Answers2