I want to perform several lmer
aggregations of all pairwise combinations of dependent and independent variables (plus some covariates). The number of deps is only a few, but the indeps are ~7000. Rows are participants, which in the example is only 1000, but in practice needs to scale to 10k-100k. This runs, but on the full model with more covariates it takes a fair amount of time (~2 hours w/ 8 threads) and I would like to shorten that up if possible.
library(data.table)
library(lmerTest) # loads lme4
# Functions -------------------------
get_linreg = function(dat, outcome, exposure){
model = lmer(get(outcome) ~ get(exposure) + (1|IQ) + HrsTrn + HrsWrk,
na.action = na.exclude, data=dat)
# using Satterthwaite's approximation of significance
a = summary(model)
return(as.list(c(a$coefficients[2, -3], nobs=nobs(model))))
}
# Main -------------------------
# test data from (https://towardsdatascience.com/canonical-correlation-analysis-b1a38847219d)
dat = fread('https://articledatas3.s3.eu-central-1.amazonaws.com/CanonicalCorrelationAnalysisData.csv')
# make out table
outs = c(names(dat)[7:9]) # dependent vars
exps = c(names(dat)[1:3]) # independent vars
dt1 = CJ(dx=outs, ix=exps)
# fast run function
dt1[, c("beta", "se", "t stat", "pval", "nobs") := get_linreg(dat, dx, ix), by=dt1[,.I]]
I think I have it as fast as it will go. Several posts 1 2 have mentioned vectorizing the function for optimization, but both summary
and nobs
depend on model
and cannot be extracted directly from the model. The lmer
function is expecting a data table with the formula specified as col names, so I don't think I can melt it either 3.