I am new. I want to use the emmeans
function to calculate estimated marginal means based on a model. This model is done by lmer
function. The problem is I have lots (20ish) of fixed effect variables and one random effect variable. I can run lmer
with no problem. By the way I set the 20ish categorical variables as factors before I run the lmer
. When I use emmeans
, the error shows
Error: cannot allocate vector of size 49391.4 Gb
I know it is a memory issue. If I use 2-3 variables to build model, the emmeans function will run, although it takes 20 minutes to finish. The dataset is quite big (20 k). Does anyone experience the same thing? Or I should use a different function? Is there anyway to make it work in R? I am a spss user, it seems it does not take spss long to calculate this, I do not understand why I can not run it in R.
My R script looks like this:
mod1 <- lmer(overall ~ age + gender + job + a + b + ... + c + (1 | groupcode), data=dat, REML=T)
res1 <- emmeans::emmeans(mod1, specs = "age")
res2 <- emmeans::emmeans(mod1, specs = "gender")
...
Follow up: hi, I have found some free data online, so I can try to replicate the issue. I could not replicate the issue 100%, but it shows the problem that emmeans function takes too long. If I have a bigger dataset and with more variables, it won't run at all. Here are the codes:
library(dplyr)
library(stringr)
rm(list = ls())
#data source
#http://www.bristol.ac.uk/cmm/learning/support/datasets/
#bottom of the page: Multilevel ordinal models for examination grades database (zip, 0.9 mb)
#unzip the file and saved under cc:\momeg\
#I used file :a-level-geography.txt
#import data
dat <- read.csv("C:\\momeg\\a-level-geography.txt", header = FALSE, sep = "")
#assign column names
colnames(dat) <- c("A-SCORE", "BOARD", "GCSE-G-SCORE", "GENDER", "GTOT", "GNUM", "GCSE-MA-MAX", "GCSE-math-n", "AGE",
"INST-GA-MN", "INST-GA-SD", "INSTTYPE", "LEA", "INSTITUTE", "STUDENT") %>%
tolower(.) %>%
str_replace_all(., "-", "_")
#number of records
nrow(dat)
#centering score
dat$'a_score' <- dat$'a_score'- mean(dat$'a_score')
#setup catorgorical variables as factor
dat$gender <- factor(dat$gender)
dat$age <- factor(dat$age)
dat$gcse_g_score <- factor(dat$gcse_g_score)
dat$gcse_math_n <- factor(dat$gcse_math_n)
dat$insttype <- factor(dat$insttype)
library(lme4)
library(emmeans)
#run model
mod1 <- lmer(a_score ~ age + gender + gcse_g_score + gcse_math_n + insttype + (1 | institute), data=dat, REML=T)
summary(mod1)
#get emmean
emm_options(pbkrtest.limit = 50000) #increase the limit to aviod note about d.f to be disabled.
start.time <- Sys.time() #figure out how long it taks r to run the emmeans function
age.means <- emmeans::emmeans(mod1, specs = "age")
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
I have run the emmeans function for over an hour now and it is still running. Why it takes so long?