R-Studio Vector Memory Exhausted (limit reached?)

Question

and TIA for any guidance provided!

I have created a linear mixed effects model using lmer. The model predicts reaction time of an athlete using two fixed effects (both are continuous variables): the number of head impacts experienced and the time (in days) between their last head impact and the date of the reaction time test. I have random effects controlling for athlete and test date (a total of 127 players and 5 test dates). There are a total of 15,413 data points in this prediction equation.

anti.lat.impacts <- lmer( RT ~ Cumulative.Head.Impacts + Date.Difference + (1|Analysis.ID) + (1|TestNumber), data = Correct.AntiSaccade1, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead"))

I am attempting to run the Anova function on this prediction equation to see if the fixed effect of cumulative head impacts affects the reaction time prediction.

Anova(anti.lat.impacts,type=3, test="F")

However, after an hour of waiting for the function to complete, I receive the Error message:

Error: vector memory exhausted (limit reached?)

Can anyone provide some assistance on how I might bypass this error?

For reference, I am running

sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: OS X Snow Leopard 11.6.8
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

Ben Bolker · Answer 1 · 2022-08-24T13:52:00.720

The problem is that, when using test.statistic = "F" (which is also the default), for models from lmer fits car::Anova() uses Kenward-Roger correction, which is expensive for large data sets. You might do slightly better loading the lmerTest package and using the anova() method there (the K-R computation there looks slightly faster, although still memory-hungry, and it offers a Satterthwaite option that might be better than car::Anova()'s test.statistic = "Chisq")

From the description of the test.statistic argument from ?car::Anova.merMod:

test.statistic: ... for linear mixed models fit by ‘lmer’, whether to calculate Wald ‘"Chisq"’ or Kenward-Roger ‘"F"’ tests with Satterthwaite degrees of freedom (warning: the KR F-tests can be very time-consuming)

(emphasis added). For the default, time scales as approximately n^2.5, memory usage as n^1.6. If you switch to test.statistic = "Chisq", everything should go much faster (based on what's here I would have expected 15,000 observations to take "only" 10 minutes, but if you hit a memory limitation then things will go much slower ...)

There is a related issue here.

library(lme4)
library(peakRAM)
library(car)
simfun <- function(n=500, ng = 20, seed=101) {
    dd <- data.frame(x = rnorm(n),
                     f = factor(sample(ng, size = n, replace = TRUE))
                     )
    dd$y <- suppressMessages(
        simulate(~ x + (1|f),
                 seed = seed,
                 newdata = dd,
                 newparams = list(theta = 1, beta = c(1,1), sigma = 1),
                 family = gaussian)[[1]])
    return(dd)
}
afun <- function(dd, test.statistic = "F") {
    m <- lmer(y ~ x + (1|f), data = dd)
    peakRAM(Anova(m, type = "III", test.statistic = test.statistic))
}


nvec <- round(10^seq(2, 3.5, length = 21))
res <- lapply(nvec, function(x) {cat(x, "\n"); afun(simfun(x))})
res <- do.call(rbind, res)
res$nvec <- nvec

res2 <- lapply(nvec, function(x) {cat(x, "\n"); afun(simfun(x),
                                                     test.statistic = "Chisq")})
res2 <- do.call(rbind, res2)
res2$nvec <- nvec

## 2.5
lm(log(Elapsed_Time_sec) ~ log(nvec), data = res[res$nvec>1000,])
## 1.6
lm(log(Peak_RAM_Used_MiB) ~ log(nvec), data = res[res$nvec>1000,])
res <- do.call(rbind, res)
par(mfrow=c(1,2), las = 1, bty = "l")
plot(Elapsed_Time_sec ~ nvec, data = res, log = "xy", type = "b",
     ylim = c(0.005, 15))
with(res2, lines(nvec, Elapsed_Time_sec, col = 2, type = "b"))
plot(Peak_RAM_Used_MiB ~ nvec, data = res, log = "xy", type = "b")

R-Studio Vector Memory Exhausted (limit reached?)

1 Answers1