I have a mixed-effects model that I am running in the package MCMCglmm in R. There are several possible predictors that I am interested in exploring. How can I carry out model simplification to exclude uninformative predictors from the model?
In other contexts, I have used stepwise backwards regression i.e. starting with a "full" model which contains all of the predictors that I am interested in, and using drop1()
to find uninformative variables, followed by update()
to remove them from the model.
Neither of these functions appear to work with MCMCglmm objects. What is my best alternative? I guess that one reason that drop1()
doesn't work is that the MCMCglmm object does not have an AIC value. Could the DIC value be used in a similar way instead? I don't know much about DIC values, but in the example below they don't seem to be behaving like AIC would: after removing a variable from the model, the DIC is reduced by over 100.
For update()
, clearly it's possible to specify the model again from scratch, but it involves copy/pasting quite a long function call, and gets even more complex with longer model formulae that include interactions. I wondered if there is a simpler alternative.
I'm also wondering if there is a reason that there has not been a method written for these functions to work with MCMCglmm objects - is what I am trying to do in some way invalid?
Reproducible example using partially invented data:
set.seed(1234)
library(MCMCglmm)
data(bird.families)
n <- Ntip(bird.families)
# Create some dummy variables
d <- data.frame(taxon = bird.families$tip.label,
X1 = rnorm(n),
X2 = rnorm(n),
X3 = sample(c("A", "B", "C"), n, replace = T),
X4 = sample(c("A", "B", "C"), n, replace = T))
# Simulate a phenotype composed of phylogenetic, fixed and residual effects
d$phenotype <- rbv(bird.families, 1, nodes="TIPS") +
d$X1*0.7 +
ifelse(d$X3 == "B", 0.5, 0) +
ifelse(d$X3 == "C", 0.8, 0) +
rnorm(n, 0, 1)
# Inverse matrix of shared phyloegnetic history
Ainv <- inverseA(bird.families)$Ainv
# Set priors
prior <- list(R = list(V = 1, nu = 0.002),
G = list(G1 = list(V = 1, nu = 0.002)))
# Run model
model <- MCMCglmm(phenotype ~ X1 + X2 + X3 + X4,
random = ~taxon,
ginverse = list(taxon=Ainv),
data = d,
prior = prior,
verbose = FALSE)
summary(model)
# drop1(model, test = "Chisq") # doesn't work
# update(model, ~.- X2) # doesn't work
model2 <- MCMCglmm(phenotype ~ X1 + X3 + X4,
random = ~taxon,
ginverse = list(taxon=Ainv),
data = d,
prior = prior,
verbose = FALSE)
summary(model2)
Full model output:
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 145.2228
G-structure: ~taxon
post.mean l-95% CI u-95% CI eff.samp
taxon 1.808 0.2167 3.347 17.88
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.6712 0.0003382 1.585 20.98
Location effects: phenotype ~ X1 + X2 + X3 + X4
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.6005279 -1.5753606 0.2689091 1000.0 0.198
X1 0.7506249 0.5220491 1.0033785 506.8 <0.001 ***
X2 -0.0339347 -0.2452923 0.2085228 900.7 0.712
X3B 0.6276852 0.1238884 1.1727409 1000.0 0.022 *
X3C 1.0533919 0.4244092 1.5095804 1000.0 <0.001 ***
X4B -0.0002833 -0.5099753 0.5103896 478.3 0.978
X4C 0.0723440 -0.4435520 0.6193011 1000.0 0.776
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Reduced model output:
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 296.4755
G-structure: ~taxon
post.mean l-95% CI u-95% CI eff.samp
taxon 1.542 0.07175 3.1 26.33
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.8155 0.0004526 1.661 21.89
Location effects: phenotype ~ X1 + X3 + X4
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -0.606920 -1.380267 0.335225 1000.0 0.172
X1 0.750929 0.540667 1.011041 1000.0 <0.001 ***
X3B 0.633123 0.149282 1.243790 1000.0 0.032 *
X3C 1.047085 0.518664 1.570641 1000.0 <0.001 ***
X4B 0.007019 -0.528080 0.512161 509.2 0.998
X4C 0.063093 -0.490103 0.590965 1000.0 0.818
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1