I am trying to expand on the answers to this question: specifically, how does one build simulations for linear mixed effects models with a random intercept 'from scratch' (without simulate.merMod
or arm
)?
I am asking because I am interested in resampling parameter estimates obtained from fitted models to simulate - rather than predict - responses at new values while also resampling fixed effect parameter estimates, random effect estimates, and the residual variance. I would appreciate feedback on what I'm getting wrong or how to improve the process.
I started from Ben Bolker's suggestion and use mvrnorm()
to draw the new coefficients from the sampling distribution of the fixed-effect parameters.
library(lme4)
mod1 <- lmer(Sepal.Length ~ Sepal.Width + (1 | Species), data = iris)
v <- vcov(mod1)
b <- mod1@beta
fixed <- MASS::mvrnorm(1000, mu=b, Sigma=v)
I then reshape the resampled coefficients using melt
, and rename the columns.
library(reshape)
library(dplyr)
fixed.resampled = reshape::melt(fixed, id.vars= "refs") %>%
dplyr::rename(species=X2)
From the answer to a question about plotting random effects, I extract and save the ranef
object, save the variances of intercepts in qq
, and place the intercept estimate, standard deviation, and random effect level in a data frame.
randoms <- ranef(mod1, condVar = TRUE)
qq <- attr(ranef(mod1, condVar = TRUE)[[1]], "postVar")
rand.interc <- randoms$Species
df <- data.frame(Intercepts=randoms$Species[,1],
sd.interc=2*sqrt(qq[,,1:length(qq)]),
lev.names=rownames(rand.interc))
As with the fixed effect coefficients, I draw the random effect estimates from a sampling distribution. Each random effect estimate is sampled from a normal distribution with mean df$Intercepts
and standard deviation df$sd.interc
. I then create an empty array to hold the samples.
ranef.resampled <- array(dim=c(1000,3))
I sample from a normal distribution for each species, using species-specific means and standard deviations.
for(i in 1:length(randoms$Species[,1])){
ranef.resampled[,i] <- rnorm(1000,mean=df$Intercepts[i],sd=df$sd.interc[i])
}
I then reshape and rename the data frame.
ranef.resampled <- data.frame(ranef.resampled) %>%
dplyr::rename(setosa=X1,versicolor=X2,virginica=X3) %>%
gather() %>%
dplyr::rename(species=key)
For plotting purposes, I create data frames that hold the coefficient and random effect estimates.
fixed.e <- data.frame(species=names(fixef(mod1)),estimate=fixef(mod1)[1:2])
ranef.e <- data.frame(species=row.names(rand.interc),estimate=rand.interc[,1])
Bring everything together.
dfLong <- fixed.resampled %>%
dplyr::bind_rows(ranef.resampled) %>%
dplyr::bind_rows(fixed.e) %>%
dplyr::bind_rows(ranef.e) %>%
dplyr::rename(X2=species)
Plot the coefficients and random effect estimates.
ggplot(dfLong, aes (value,group=X2)) +
geom_density() +
geom_vline(data=dfLong,
aes(xintercept=estimate),
color="red",linetype="dashed") +
facet_grid(X2~.) +
theme_classic()
To simulate values, I first create a new data frame to hold values for which to perform the simulation. Here, I generate 1000 samples that match the distribution of the original data for Species="setosa"
.
set.seed(101)
newDf <- data.frame(Sepal.Width = rnorm(1000,mean(iris$Sepal.Width[1:50]), sd(iris$Sepal.Width[1:50])), Species="setosa")
I then use this new data to simulate outcomes using simulate.merMod
(which is what is called when simulate()
is called on a mer
model object.
simulated.setosa <- data.frame(setosa=simulate(mod1, nsim=1, newdata=newDf)[1:1000,])
I want to generate similar predictions without using simulate()
. So far, I have taken the steps below. Sum: the intercept, the parameter estimate for 'width' multiplied with simulated values of 'width', and the resampled random effect estimates for the 'setosa.'
sim.setosa <- data.frame(setosa=fixed[1:1000,1] + fixed[1:1000,2] * newDf$Sepal.Width[1:1000] + ranef.resampled[1:1000,2])
I then use the following to compare the output of these two simulations. While I don't expect them to match perfectly because resampling coefficients is internal to simulate()
, the difference is greater than I expect.
ggplot() +
geom_density(data=simulated.setosa,aes(setosa)) +
geom_density(data=sim.setosa,aes(setosa),col="red") +
theme_classic() +
xlim(c(0,10))