Speed up taking dot product of all pairs of rows and adding intercept term

Question

I am applying 100 different model simulations to 83 data points and examining the range of estimates from each model simulation for each data point.

Each calculation itself is the product of 342 variables multiplied by 342 coefficients, plus an intercept added. The code I wrote below calculates the values correctly, but is heinously slow. Is there any way to improve processing speed?

spec = read.csv(spectra)
coef = read.csv(coefficents)

shell     = matrix(data=NA,ncol=101,nrow=nrow(spec))
shell     = as.data.frame(shell)
heading = paste("Sim_",seq(1,100,1),sep="")
names(shell)[1] = "Filename"
names(shell)[2:101] = heading

shell[1] = spec[1]

for (i in 1:nrow(spec))
{
  for (j in 1:100)
  {
    shell[i,j+1] = sum(spec[i,2:341]*coef[j,3:342]) + coef[j,2]
  }
}

possible duplicate of http://stackoverflow.com/questions/2908822/speed-up-the-loop-operation-in-r — Megha, Feb 02 '16 at 06:00
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example is worth a read first - we don't have `spec` `shell` or `coef` and thus can't make any concrete suggestions without representative data to play with. — thelatemail, Feb 02 '16 at 06:01

josliber · Accepted Answer · 2016-02-02T06:40:21.217

Really you are performing matrix multiplication of spec with the transpose of coef and then adding a constant to each column. You should get a speedup by using the built-in matrix multiplication function %*% and vectorized operations to scale the columns:

out <- cbind(spec[,1], t(t(spec[,2:341] %*% t(coef[1:100,3:342])) + coef[1:100,2]))

The output of this 1-liner is identical to the output of the code in the original post (modified slightly to take and output matrices and not to set any names):

OP <- function(spec, coef) {
  shell = matrix(data=NA,ncol=101,nrow=nrow(spec))
  shell[,1] <- spec[,1]
  for (i in 1:nrow(spec)) {
    for (j in 1:100) {
      shell[i,j+1] = sum(spec[i,2:341]*coef[j,3:342]) + coef[j,2]
    }
  }
  shell
}
all.equal(out, OP(spec, coef))
# [1] TRUE

In terms of runtimes, the vectorized operations yield significant benefit (38x), even for this relatively small example (1000 rows in spec):

system.time(cbind(spec[,1], t(t(spec[,2:341] %*% t(coef[1:100,3:342])) + coef[1:100,2])))
#    user  system elapsed 
#   0.028   0.001   0.030 
system.time(OP(spec, coef))
#    user  system elapsed 
#   0.927   0.224   1.161

Data:

set.seed(144)
spec <- matrix(rnorm(1000*341), nrow=1000)
coef <- matrix(rnorm(100*342), nrow=100)

Speed up taking dot product of all pairs of rows and adding intercept term

1 Answers1