I have a data matrix where every column corresponds to some measured substance concentrations and I need to regress every substance to every other substance, with some fixed correction covariates.
As the design matrix is changing all the time, functions like fastLm()
from the RcppArmadillo
package are not substantially useful (I checked this before).
The very naive and unadvisable idea is to make a for loop, like
Ncol <- ncol(mat)
mat1 <- mat2 <- matrix(ncol=Ncol, nrow=Ncol) ## matrices where I'll save what I need
for(i in seq(Ncol)) {
for(j in seq(Ncol)[-i]) {
mylm <- lm(mat[,i] ~ mat[,j] + covariates)
mat1[i,j] <- summary(mylm)$something
mat2[i,j] <- summary(mylm)$something.else
}
}
which I am actually currently running, as I had no better ideas. I am not familiar with vectorization algorithms, but I am pretty sure it would kick the speed up a notch.
Does anybody have any suggestion about how to make the computation faster? I have to run the analysis on 4 datasets, with approximately 300, 650, 800, 2000 columns each...