1

I am applying 100 different model simulations to 83 data points and examining the range of estimates from each model simulation for each data point.

Each calculation itself is the product of 342 variables multiplied by 342 coefficients, plus an intercept added. The code I wrote below calculates the values correctly, but is heinously slow. Is there any way to improve processing speed?

spec = read.csv(spectra)
coef = read.csv(coefficents)

shell     = matrix(data=NA,ncol=101,nrow=nrow(spec))
shell     = as.data.frame(shell)
heading = paste("Sim_",seq(1,100,1),sep="")
names(shell)[1] = "Filename"
names(shell)[2:101] = heading

shell[1] = spec[1]

for (i in 1:nrow(spec))
{
  for (j in 1:100)
  {
    shell[i,j+1] = sum(spec[i,2:341]*coef[j,3:342]) + coef[j,2]
  }
}
josliber
  • 43,891
  • 12
  • 98
  • 133
  • possible duplicate of http://stackoverflow.com/questions/2908822/speed-up-the-loop-operation-in-r – Megha Feb 02 '16 at 06:00
  • 2
    http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example is worth a read first - we don't have `spec` `shell` or `coef` and thus can't make any concrete suggestions without representative data to play with. – thelatemail Feb 02 '16 at 06:01
  • 1
    You added new unknown objects: `spectra` and `coefficients` –  Feb 02 '16 at 06:04

1 Answers1

4

Really you are performing matrix multiplication of spec with the transpose of coef and then adding a constant to each column. You should get a speedup by using the built-in matrix multiplication function %*% and vectorized operations to scale the columns:

out <- cbind(spec[,1], t(t(spec[,2:341] %*% t(coef[1:100,3:342])) + coef[1:100,2]))

The output of this 1-liner is identical to the output of the code in the original post (modified slightly to take and output matrices and not to set any names):

OP <- function(spec, coef) {
  shell = matrix(data=NA,ncol=101,nrow=nrow(spec))
  shell[,1] <- spec[,1]
  for (i in 1:nrow(spec)) {
    for (j in 1:100) {
      shell[i,j+1] = sum(spec[i,2:341]*coef[j,3:342]) + coef[j,2]
    }
  }
  shell
}
all.equal(out, OP(spec, coef))
# [1] TRUE

In terms of runtimes, the vectorized operations yield significant benefit (38x), even for this relatively small example (1000 rows in spec):

system.time(cbind(spec[,1], t(t(spec[,2:341] %*% t(coef[1:100,3:342])) + coef[1:100,2])))
#    user  system elapsed 
#   0.028   0.001   0.030 
system.time(OP(spec, coef))
#    user  system elapsed 
#   0.927   0.224   1.161 

Data:

set.seed(144)
spec <- matrix(rnorm(1000*341), nrow=1000)
coef <- matrix(rnorm(100*342), nrow=100)
josliber
  • 43,891
  • 12
  • 98
  • 133