Get model estimates from another reference level, without running new model?

Question

I am wondering if there is a simple way to change what values are in the intercept, perhaps mathematically, without re-running large models. As an example:

mtcars$cyl<-as.factor(mtcars$cyl)

summary(
  lm(mpg~cyl+hp,data=mtcars)
)

Output:

    Call:
lm(formula = mpg ~ cyl + hp, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-4.818 -1.959  0.080  1.627  6.812 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 28.65012    1.58779  18.044  < 2e-16 ***
cyl6        -5.96766    1.63928  -3.640  0.00109 ** 
cyl8        -8.52085    2.32607  -3.663  0.00103 ** 
hp          -0.02404    0.01541  -1.560  0.12995    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.146 on 28 degrees of freedom
Multiple R-squared:  0.7539,    Adjusted R-squared:  0.7275 
F-statistic: 28.59 on 3 and 28 DF,  p-value: 1.14e-08

Now I can change the reference level to 6 cyl, and can see how 8 cyl now compares to 6 cyl, rather than 4 cyl:

mtcars$cyl<-relevel(mtcars$cyl,"6")

summary(
  lm(mpg~cyl+hp,data=mtcars)
)

Output:

Call:
lm(formula = mpg ~ cyl + hp, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-4.818 -1.959  0.080  1.627  6.812 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22.68246    2.22805   10.18 6.48e-11 ***
cyl4         5.96766    1.63928    3.64  0.00109 ** 
cyl8        -2.55320    1.97867   -1.29  0.20748    
hp          -0.02404    0.01541   -1.56  0.12995    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.146 on 28 degrees of freedom
Multiple R-squared:  0.7539,    Adjusted R-squared:  0.7275 
F-statistic: 28.59 on 3 and 28 DF,  p-value: 1.14e-08

What I am wondering is there a way to get these values without re-running a model? You can see that the comparison from 4 cyl to 6 cyl is the same in each model (-5.96 and 5.96), but how would I get the estimate for the 'other' coefficient in either model (e.g. the -2.55 from the first model). Of course in this case, it takes a fraction of a second to run the other model. But with very large models, it would be convenient to be able to change reference level without re-running. Are there relatively simple ways to convert all of the estimates and standard errors to be based off of a different reference level, or is it too complicated to do such a thing?

Any solutions for lme4, glmmTMB, or rstanarm models would be appreciated.

There is also `multcomp::glht` which can allow you to build up queries manually. — user20650, Oct 05 '20 at 21:16
potentially helpful : https://stats.stackexchange.com/questions/168650/how-to-set-custom-contrasts-with-lmer-in-r — Mike, Oct 06 '20 at 18:59

Allan Cameron · Answer 1 · 2020-10-06T18:07:17.130

Here's a function that will give you the coefficiencts for every rearrangement of a given factor variable without having to run the model again or specify contrasts:

rearrange_model_factors <- function(model, var)
{
  var   <- deparse(substitute(var))
  coefs <- coef(model)
  level_coefs <- grep(paste0("^", var), names(coefs))
  coefs[level_coefs] <- coefs[1] + coefs[level_coefs]
  used_levels <- gsub(var, "", names(coefs[level_coefs]))
  all_levels <- levels(model$model[[var]])
  names(coefs)[1] <- paste0(var, setdiff(all_levels, used_levels))
  level_coefs <- grep(paste0("^", var), names(coefs))
  levs <- coefs[level_coefs]
  perms <- gtools::permutations(length(levs), length(levs))
  perms <- lapply(seq(nrow(perms)), function(i) levs[perms[i,]])

  lapply(perms, function(x) { 
    x[-1] <- x[-1] - x[1]
    coefs[level_coefs] <- x
    names(coefs)[level_coefs] <- names(x)
    names(coefs)[1] <- "(Intercept)"
    coefs
    })
}

Suppose you had a model like this:

iris_mod <- lm(Sepal.Width ~ Species + Sepal.Length, data = iris)

To see how your coefficients would change if Species were in a different order, you would just do:

rearrange_model_factors(iris_mod, Species)
#> [[1]]
#>       (Intercept) Speciesversicolor  Speciesvirginica      Sepal.Length 
#>         1.6765001        -0.9833885        -1.0075104         0.3498801 
#> 
#> [[2]]
#>       (Intercept)  Speciesvirginica Speciesversicolor      Sepal.Length 
#>         1.6765001        -1.0075104        -0.9833885         0.3498801 
#> 
#> [[3]]
#>      (Intercept)    Speciessetosa Speciesvirginica     Sepal.Length 
#>       0.69311160       0.98338851      -0.02412184       0.34988012 
#> 
#> [[4]]
#>      (Intercept) Speciesvirginica    Speciessetosa     Sepal.Length 
#>       0.69311160      -0.02412184       0.98338851       0.34988012 
#> 
#> [[5]]
#>       (Intercept)     Speciessetosa Speciesversicolor      Sepal.Length 
#>        0.66898976        1.00751035        0.02412184        0.34988012 
#> 
#> [[6]]
#>       (Intercept) Speciesversicolor     Speciessetosa      Sepal.Length 
#>        0.66898976        0.02412184        1.00751035        0.34988012

Or with your own example:

mtcars$cyl <- as.factor(mtcars$cyl)

rearrange_model_factors(lm(mpg ~ cyl + hp, data = mtcars), cyl)
#> [[1]]
#> (Intercept)        cyl6        cyl8          hp 
#> 28.65011816 -5.96765508 -8.52085075 -0.02403883 
#> 
#> [[2]]
#> (Intercept)        cyl8        cyl6          hp 
#> 28.65011816 -8.52085075 -5.96765508 -0.02403883 
#> 
#> [[3]]
#> (Intercept)        cyl4        cyl8          hp 
#> 22.68246309  5.96765508 -2.55319567 -0.02403883 
#> 
#> [[4]]
#> (Intercept)        cyl8        cyl4          hp 
#> 22.68246309 -2.55319567  5.96765508 -0.02403883 
#> 
#> [[5]]
#> (Intercept)        cyl4        cyl6          hp 
#> 20.12926741  8.52085075  2.55319567 -0.02403883 
#> 
#> [[6]]
#> (Intercept)        cyl6        cyl4          hp 
#> 20.12926741  2.55319567  8.52085075 -0.02403883

We need a bit of exposition to see why this works.

Although the function above only runs the model once, let's start by creating a list containing 3 versions of mtcars, where the baseline factor levels of cyl are all different.

df_list <- list(mtcars_4 = within(mtcars, cyl <- factor(cyl, c(4, 6, 8))),
                mtcars_6 = within(mtcars, cyl <- factor(cyl, c(6, 4, 8))),
                mtcars_8 = within(mtcars, cyl <- factor(cyl, c(8, 4, 6))))

Now we can extract the coefficients of your model for all three versions at once using lapply. For clarity, we will remove the hp coefficient, which remains static across all three versions anyway:

coefs <- lapply(df_list, function(x) coef(lm(mpg ~ cyl + hp, data = x))[-4])

coefs
#> $mtcars_4
#> (Intercept)        cyl6        cyl8 
#>   28.650118   -5.967655   -8.520851 
#> 
#> $mtcars_6
#> (Intercept)        cyl4        cyl8 
#>   22.682463    5.967655   -2.553196 
#> 
#> $mtcars_8
#> (Intercept)        cyl4        cyl6 
#>   20.129267    8.520851    2.553196

Now, we remind ourselves that the coefficient for each factor level is given relative to the baseline level. That means for the non-intercept coefficients, we can simply add the intercept value to their coefficients to get their absolute value. That means that these numbers represent the expected value for mpg when hp equals 0 for all three levels of cyl

coefs <- lapply(coefs, function(x) c(x[1], x[-1] + x[1]))

coefs
#> $mtcars_4
#> (Intercept)        cyl6        cyl8 
#>    28.65012    22.68246    20.12927 
#> 
#> $mtcars_6
#> (Intercept)        cyl4        cyl8 
#>    22.68246    28.65012    20.12927 
#> 
#> $mtcars_8
#> (Intercept)        cyl4        cyl6 
#>    20.12927    28.65012    22.68246

Since we now have all three values as absolutes, let's rename "Intercept" to the appropriate factor level:

coefs <- mapply(function(x, y) { names(x)[1] <- y; x}, 
                x = coefs, y = c("cyl4", "cyl6", "cyl8"), SIMPLIFY = FALSE)
coefs
#> $mtcars_4
#>     cyl4     cyl6     cyl8 
#> 28.65012 22.68246 20.12927 
#> 
#> $mtcars_6
#>     cyl6     cyl4     cyl8 
#> 22.68246 28.65012 20.12927 
#> 
#> $mtcars_8
#>     cyl8     cyl4     cyl6 
#> 20.12927 28.65012 22.68246

Finally, let's rearrange the order so we can compare the absolute values of all three factor levels:

coefs <- lapply(coefs, function(x) x[order(names(x))])

coefs
#> $mtcars_4
#>     cyl4     cyl6     cyl8 
#> 28.65012 22.68246 20.12927 
#> 
#> $mtcars_6
#>     cyl4     cyl6     cyl8 
#> 28.65012 22.68246 20.12927 
#> 
#> $mtcars_8
#>     cyl4     cyl6     cyl8 
#> 28.65012 22.68246 20.12927

We can see they are all the same. This is why the ordering of factors is arbitrary in lm: changing the order of the factor levels gives the same numerical predictions in the end, even if the summary appears different.

TL;DR

So the answer to your question of where do you get the -2.55 if you only have the first model is find the difference between the non-intercept coefficients. In this case

(-8.520851) -(-5.967655) 
#> [1] -2.553196

Alternatively, add the intercept on to the non-intercept coefficients and you can see what the intercept would be if any of the levels were baseline and you can get the coefficient for any level relative to any other by simple subtraction. That's how my function rearrange_model_factors works.

^{Created on 2020-10-05 by the reprex package (v0.3.0)}

I don't think this does what the OP wants: they want to be able to retrieve the coefficients/std errs etc. for a reparameterized model *without refitting the model*. Fitting all possible levels in advance doesn't satisfy that request, I think. — Ben Bolker, Oct 05 '20 at 22:39
@BenBolker that's not what I'm suggesting. I'm just showing the equivalence of different orderings of factors. The OP asks "_but how would I get the estimate for the 'other' coefficient in either model (e.g. the -2.55 from the first model)._" - and I am showing how — Allan Cameron, Oct 05 '20 at 22:42
@BenBolker what I mean is how to retrieve the coefficients of a differently ordered factor _without_ rerunning the model. Sorry my exposition wasn't clearer. — Allan Cameron, Oct 05 '20 at 22:46
you are showing how to do it by hand. I think the OP is looking for machinery ... — Ben Bolker, Oct 05 '20 at 22:51
@BenBolker yes, you're probably right. I'll tidy up my answer and add a function to automate this in the morning I think - thanks. — Allan Cameron, Oct 05 '20 at 22:54
@BenBolker updated with machinery. I _think_ this was what the OP was looking for. — Allan Cameron, Oct 06 '20 at 12:22
This is a thorough explanation of how to calculate estimates - thank you for this. Is it as straightforward to calculate SE/CI? I am guessing not. Of course my actual model is a much more complicated `rstanarm` model with random slopes for interaction terms, so I am trying to figure out a way to manually plot estimates and CI around those estimates. Unfortunately I am having trouble wrapping my head around pulling some of these terms out of the intercept. Fortunately, the interaction is with a 2-level factor, so I only need to run two models to have the estimates & uncertainty. — Dylan_Gomes, Oct 07 '20 at 20:18
@Dylan_Gomes no, the SE and CI are more difficult. However,now you know you only need to add the intercept to the factor coefficients to get the absolute values at each level, and the SE will remain the same - you can calculate the CI for each one using this fact. — Allan Cameron, Oct 07 '20 at 20:29

Get model estimates from another reference level, without running new model?

1 Answers1