I simply suggest avoiding recomputing X'X
and X'y
in the loop as they are loop-invariant.
f <- function (XtX, Xty, a) (XtX + diag(a, ncol(XtX))) %*% Xty
set.seed(0)
X <- matrix(rnorm(50*5),50,5)
y <- rnorm(50)
a <- seq(0,1,0.1)
result1 <- matrix(NA, ncol(X), length(a))
XtX <- crossprod(X)
Xty <- crossprod(X, y)
for(i in 1:length(a)) {
result1[,i] <- f(XtX, Xty, a[i])
}
## compare with your `result`
all.equal(result, result1)
#[1] TRUE
hours later...
When I return I see something more to simplify.
(XtX + diag(a, ncol(XtX))) %*% Xty = XtX %*% Xty + diag(a, ncol(XtX)) %*% Xty
= XtX %*% Xty + a * Xty
So actually, XtX %*% Xty
is loop-invariant, too.
f <- function (XtX.Xty, Xty, a) XtX.Xty + a * Xty
set.seed(0)
X <- matrix(rnorm(50*5),50,5)
y <- rnorm(50)
a <- seq(0,1,0.1)
result2 <- matrix(NA, ncol(X), length(a))
XtX <- crossprod(X)
Xty <- c(crossprod(X, y)) ## one-column matrix to vector
XtX.Xty <- c(XtX %*% Xty) ## one-column matrix to vector
for(i in 1:length(a)) {
result2[,i] <- f(XtX.Xty, Xty, a[i])
}
## compare with your `result`
all.equal(result, result2)
#[1] TRUE
And it turns out that we can get rid of the loop:
## "inline" function `f`
for(i in 1:length(a)) {
result2[,i] <- XtX.Xty + a[i] * Xty
}
## compare with your `result`
all.equal(result, result2)
#[1] TRUE
## do it with recycling rule
for(i in 1:length(a)) {
result2[,i] <- a[i] * Xty
}
result2 <- XtX.Xty + result2
## compare with your `result`
all.equal(result, result2)
#[1] TRUE
## now use `tcrossprod`
result2 <- XtX.Xty + tcrossprod(Xty, a)
## compare with your `result`
all.equal(result, result2)
#[1] TRUE
I completely agree with you that your example code in the question is just a "foo"
. And you may not have thought carefully about it when posting it. However, it suffices to show that when writing a loop, either an R loop or a C / C++ / FORTRAN loop, we should always seek to pull those loop-invariant out of the loop to reduce computational complexity.
Your concern is to get speedup with Rcpp (or any compiled language). You want to vectorize a section of R code that is not readily vectorized. However, "%*%"
, crossprod
and tcrossprod
are mapped to BLAS (FORTRAN code) and are not R-level computations. You readily have everything vectorized.
Don't always blame the interpretation overhead (as R is an interpreted language) of an R-loop for the poor performance. Such overhead is insignificant if each iteration is doing some "heavy" computations, like big matrix computations (using BLAS) or fitting statistical models (like lm
). In fact, if you do want to write such a loop with a compiled language, use lapply
function. This function implements loop at C level, and calls R function for each iteration. Alternatively, Ralf's answer is an Rcpp equivalent. In my opinion, a loop nest written in R code is more likely to be inefficient.