2

I used MuMIn::model.avg to average several models* and I am interested in plotting the predicted results of the conditional (not full) model average. I tried both ggeffects::ggpredict and sjPlot::plot_model, and both only give the full model results. I can get the predicted estimated using predict() which has an option to choose whether to use the full or conditional model (using full = False for conditional). But if I state se.fit = True to get the standard error, then I get a warning saying 'argument 'full' is ignored', and it predicts the results of the full model. I also tried using emmeans following this answer, but it also uses the full model.

*The same problem occurs with simple linear (lm) and with generalized (glm) models.

Is there a way to get the predicted results from a conditional average model and their SE or CI? Or even better, a way to plot them?

I am not sure if my problem is a statistics problem (i.e., what I am asking cannot be done statistically) or an R problem. I hope it is the second but will appreciate an explanation if it is the first.

I didn't add data because I don't think it is relevant, but I can do it if required. All explanatory variables are factors (as you can see in my NewData dataframe).

Here are the few lines of code I tried:

m1 <- lm(A ~ B*C + d, data=df, na.action="na.fail")
dd1 <- dredge(m1, subset=Origin)
m1.avg <- model.avg(dd1, fit=TRUE)
plot_model(m1.avg, type="pred", terms=c("B", "C", "d"))

NewData <- data.frame(B=c(rep(c("b1", "b2"), 6)), 
                      D=c(rep("d1", 6), rep("d2", 6)), 
                      C=c(rep(c("C1", "C1", "C3", "C3", "C5", "C5"), 2)))
cbind(NewData, pre=predict(m1.avg, newdata=NewData, full=F, se.fit=T))
Ron Efrat
  • 41
  • 3

1 Answers1

2

I tried adding the option full to the emmeans support for averaging objects, rather than have it force full = TRUE. Here is what happens with the first example for model.avg:

require(MuMIn)
## Loading required package: MuMIn
require(emmeans)
## Loading required package: emmeans

# Example from Burnham and Anderson (2002), page 100:
fm1 <- lm(y ~ ., data = Cement, na.action = na.fail)
ms1 <- dredge(fm1)
## Fixed term is "(Intercept)"
ma = model.avg(ms1, subset = delta < 4, fit = TRUE)

confint(ref_grid(ma, data = Cement, full = TRUE))
##    X1   X2   X3 X4 prediction   SE df lower.CL upper.CL
##  7.46 48.2 11.8 30         98 4.12  9     88.7      107
## 
## Confidence level used: 0.95

confint(ref_grid(ma, data = Cement, full = FALSE))
## Warning in sqrt(.qf.non0(object@V, x)): NaNs produced
##    X1   X2   X3 X4 prediction  SE df lower.CL upper.CL
##  7.46 48.2 11.8 30         98 NaN  9      NaN      NaN
## 
## Confidence level used: 0.95

Created on 2022-01-06 by the reprex package (v2.0.0)

The problem is that with full = FALSE, the covariance matrix returned is not necessarily positive-definite:

eigen(vcov(ma, full = FALSE))
## eigen() decomposition
## $values
## [1] 494.70528609   0.07488847   0.04051689  -0.01407614  -0.14314881
## 
## $vectors
##              [,1]        [,2]         [,3]        [,4]        [,5]
## [1,]  0.999708163  0.01281974  0.004187958 -0.00648858 -0.01896318
## [2,] -0.004861364  0.86884310  0.042589117 -0.24907880  0.42571582
## [3,] -0.002164749 -0.11468216 -0.696686782 -0.70017769 -0.10593418
## [4,] -0.008651137 -0.14293079  0.715383269 -0.66298278 -0.16785875
## [5,] -0.021918654  0.45974571 -0.031983342  0.09012598 -0.88261420

The fact that some eigenvalues are negative means that we obtain negative variance estimates of certain linear functions of the regression coefficients. That is a deal-breaker for allowing conditional estimates.

[Note: the full option for ref_grid() was just added temporarily; it is not available in any release of emmeans]

Russ Lenth
  • 5,922
  • 2
  • 13
  • 21
  • Thanks a lot! So the bottom line is that it isn't possible. I would love to understand the reason if anyone can explain, but it seems that this is a question for another forum. – Ron Efrat Jan 11 '22 at 11:23
  • I don't think the conditional averages even make sense. The fact that a variable is excluded from a model is valid information that should not be ignored. NASA would not have launched the Challenger if they had considered the data for launches where there were no O-ring failures. But they only looked at those where there were, and the result was a disaster. The conditional average is like ignoring those zeros. – Russ Lenth Jan 11 '22 at 14:29
  • I think it really depends on your question and models. If, e.g., you have two competing models, one with two variables and one with just one of them, then using the full average means you reduce the effect size of one of the variables by 50%, does this make sense? A model that does not have a variable does not mean it has no (0) effect, so why treat it this way when you average models? – Ron Efrat Jan 11 '22 at 18:30
  • @RonEfrat I suggest posting a question on CrossValidated about the appropriateness of using the conditionally averaged model for predictions. – Russ Lenth Jan 12 '22 at 00:04
  • I did so myself. [Here it is](https://stats.stackexchange.com/questions/560242/do-conditional-averaged-coefficients-ever-make-sense). – Russ Lenth Jan 12 '22 at 20:24
  • I am too new in these forums so I can't reply to your question in CV, but I think it will be good if you could add another example: what happens if you have two models with similar weights (thus about 0.5 each), and one of the variables that appears in one of them has a coefficient of 90? Using the full average and by that reducing its coefficient to 45 might as problematic as using the conditional average in your example. – Ron Efrat Jan 13 '22 at 03:46