Simulations from a linear mixed model with random intercepts

Question

I am trying to expand on the answers to this question: specifically, how does one build simulations for linear mixed effects models with a random intercept 'from scratch' (without simulate.merMod or arm)?

I am asking because I am interested in resampling parameter estimates obtained from fitted models to simulate - rather than predict - responses at new values while also resampling fixed effect parameter estimates, random effect estimates, and the residual variance. I would appreciate feedback on what I'm getting wrong or how to improve the process.

I started from Ben Bolker's suggestion and use mvrnorm() to draw the new coefficients from the sampling distribution of the fixed-effect parameters.

library(lme4)
mod1 <- lmer(Sepal.Length ~ Sepal.Width + (1 | Species), data = iris)

v <- vcov(mod1)
b <- mod1@beta

fixed <- MASS::mvrnorm(1000, mu=b, Sigma=v)

I then reshape the resampled coefficients using melt, and rename the columns.

library(reshape)
library(dplyr)
fixed.resampled = reshape::melt(fixed, id.vars= "refs") %>%
  dplyr::rename(species=X2)

From the answer to a question about plotting random effects, I extract and save the ranef object, save the variances of intercepts in qq, and place the intercept estimate, standard deviation, and random effect level in a data frame.

randoms <- ranef(mod1, condVar = TRUE)
qq <- attr(ranef(mod1, condVar = TRUE)[[1]], "postVar")
rand.interc <- randoms$Species
df <- data.frame(Intercepts=randoms$Species[,1],
               sd.interc=2*sqrt(qq[,,1:length(qq)]),
               lev.names=rownames(rand.interc))

As with the fixed effect coefficients, I draw the random effect estimates from a sampling distribution. Each random effect estimate is sampled from a normal distribution with mean df$Intercepts and standard deviation df$sd.interc. I then create an empty array to hold the samples.

ranef.resampled <- array(dim=c(1000,3))

I sample from a normal distribution for each species, using species-specific means and standard deviations.

for(i in 1:length(randoms$Species[,1])){
  ranef.resampled[,i] <- rnorm(1000,mean=df$Intercepts[i],sd=df$sd.interc[i])
}

I then reshape and rename the data frame.

 ranef.resampled <- data.frame(ranef.resampled) %>%
   dplyr::rename(setosa=X1,versicolor=X2,virginica=X3) %>%
   gather() %>%
   dplyr::rename(species=key)

For plotting purposes, I create data frames that hold the coefficient and random effect estimates.

fixed.e <- data.frame(species=names(fixef(mod1)),estimate=fixef(mod1)[1:2])
ranef.e <- data.frame(species=row.names(rand.interc),estimate=rand.interc[,1])

Bring everything together.

dfLong <- fixed.resampled %>% 
          dplyr::bind_rows(ranef.resampled) %>%
          dplyr::bind_rows(fixed.e) %>%
          dplyr::bind_rows(ranef.e) %>%
          dplyr::rename(X2=species)

Plot the coefficients and random effect estimates.

ggplot(dfLong, aes (value,group=X2)) +
  geom_density() + 
  geom_vline(data=dfLong,
    aes(xintercept=estimate),
    color="red",linetype="dashed") +
  facet_grid(X2~.) + 
  theme_classic()

To simulate values, I first create a new data frame to hold values for which to perform the simulation. Here, I generate 1000 samples that match the distribution of the original data for Species="setosa".

set.seed(101)
newDf <- data.frame(Sepal.Width = rnorm(1000,mean(iris$Sepal.Width[1:50]), sd(iris$Sepal.Width[1:50])), Species="setosa")

I then use this new data to simulate outcomes using simulate.merMod (which is what is called when simulate() is called on a mer model object.

simulated.setosa <- data.frame(setosa=simulate(mod1, nsim=1, newdata=newDf)[1:1000,])

I want to generate similar predictions without using simulate(). So far, I have taken the steps below. Sum: the intercept, the parameter estimate for 'width' multiplied with simulated values of 'width', and the resampled random effect estimates for the 'setosa.'

sim.setosa <- data.frame(setosa=fixed[1:1000,1] + fixed[1:1000,2] * newDf$Sepal.Width[1:1000] + ranef.resampled[1:1000,2])

I then use the following to compare the output of these two simulations. While I don't expect them to match perfectly because resampling coefficients is internal to simulate(), the difference is greater than I expect.

ggplot() +
  geom_density(data=simulated.setosa,aes(setosa)) +
  geom_density(data=sim.setosa,aes(setosa),col="red") +
  theme_classic() +
  xlim(c(0,10))

Simulations from a linear mixed model with random intercepts

0 Answers0