Here's a function that will give you the coefficiencts for every rearrangement of a given factor variable without having to run the model again or specify contrasts:
rearrange_model_factors <- function(model, var)
{
var <- deparse(substitute(var))
coefs <- coef(model)
level_coefs <- grep(paste0("^", var), names(coefs))
coefs[level_coefs] <- coefs[1] + coefs[level_coefs]
used_levels <- gsub(var, "", names(coefs[level_coefs]))
all_levels <- levels(model$model[[var]])
names(coefs)[1] <- paste0(var, setdiff(all_levels, used_levels))
level_coefs <- grep(paste0("^", var), names(coefs))
levs <- coefs[level_coefs]
perms <- gtools::permutations(length(levs), length(levs))
perms <- lapply(seq(nrow(perms)), function(i) levs[perms[i,]])
lapply(perms, function(x) {
x[-1] <- x[-1] - x[1]
coefs[level_coefs] <- x
names(coefs)[level_coefs] <- names(x)
names(coefs)[1] <- "(Intercept)"
coefs
})
}
Suppose you had a model like this:
iris_mod <- lm(Sepal.Width ~ Species + Sepal.Length, data = iris)
To see how your coefficients would change if Species
were in a different order, you would just do:
rearrange_model_factors(iris_mod, Species)
#> [[1]]
#> (Intercept) Speciesversicolor Speciesvirginica Sepal.Length
#> 1.6765001 -0.9833885 -1.0075104 0.3498801
#>
#> [[2]]
#> (Intercept) Speciesvirginica Speciesversicolor Sepal.Length
#> 1.6765001 -1.0075104 -0.9833885 0.3498801
#>
#> [[3]]
#> (Intercept) Speciessetosa Speciesvirginica Sepal.Length
#> 0.69311160 0.98338851 -0.02412184 0.34988012
#>
#> [[4]]
#> (Intercept) Speciesvirginica Speciessetosa Sepal.Length
#> 0.69311160 -0.02412184 0.98338851 0.34988012
#>
#> [[5]]
#> (Intercept) Speciessetosa Speciesversicolor Sepal.Length
#> 0.66898976 1.00751035 0.02412184 0.34988012
#>
#> [[6]]
#> (Intercept) Speciesversicolor Speciessetosa Sepal.Length
#> 0.66898976 0.02412184 1.00751035 0.34988012
Or with your own example:
mtcars$cyl <- as.factor(mtcars$cyl)
rearrange_model_factors(lm(mpg ~ cyl + hp, data = mtcars), cyl)
#> [[1]]
#> (Intercept) cyl6 cyl8 hp
#> 28.65011816 -5.96765508 -8.52085075 -0.02403883
#>
#> [[2]]
#> (Intercept) cyl8 cyl6 hp
#> 28.65011816 -8.52085075 -5.96765508 -0.02403883
#>
#> [[3]]
#> (Intercept) cyl4 cyl8 hp
#> 22.68246309 5.96765508 -2.55319567 -0.02403883
#>
#> [[4]]
#> (Intercept) cyl8 cyl4 hp
#> 22.68246309 -2.55319567 5.96765508 -0.02403883
#>
#> [[5]]
#> (Intercept) cyl4 cyl6 hp
#> 20.12926741 8.52085075 2.55319567 -0.02403883
#>
#> [[6]]
#> (Intercept) cyl6 cyl4 hp
#> 20.12926741 2.55319567 8.52085075 -0.02403883
We need a bit of exposition to see why this works.
Although the function above only runs the model once, let's start by creating a list containing 3 versions of mtcars
, where the baseline factor levels of cyl
are all different.
df_list <- list(mtcars_4 = within(mtcars, cyl <- factor(cyl, c(4, 6, 8))),
mtcars_6 = within(mtcars, cyl <- factor(cyl, c(6, 4, 8))),
mtcars_8 = within(mtcars, cyl <- factor(cyl, c(8, 4, 6))))
Now we can extract the coefficients of your model for all three versions at once using lapply
. For clarity, we will remove the hp
coefficient, which remains static across all three versions anyway:
coefs <- lapply(df_list, function(x) coef(lm(mpg ~ cyl + hp, data = x))[-4])
coefs
#> $mtcars_4
#> (Intercept) cyl6 cyl8
#> 28.650118 -5.967655 -8.520851
#>
#> $mtcars_6
#> (Intercept) cyl4 cyl8
#> 22.682463 5.967655 -2.553196
#>
#> $mtcars_8
#> (Intercept) cyl4 cyl6
#> 20.129267 8.520851 2.553196
Now, we remind ourselves that the coefficient for each factor level is given relative to the baseline level. That means for the non-intercept coefficients, we can simply add the intercept value to their coefficients to get their absolute value. That means that these numbers represent the expected value for mpg
when hp
equals 0 for all three levels of cyl
coefs <- lapply(coefs, function(x) c(x[1], x[-1] + x[1]))
coefs
#> $mtcars_4
#> (Intercept) cyl6 cyl8
#> 28.65012 22.68246 20.12927
#>
#> $mtcars_6
#> (Intercept) cyl4 cyl8
#> 22.68246 28.65012 20.12927
#>
#> $mtcars_8
#> (Intercept) cyl4 cyl6
#> 20.12927 28.65012 22.68246
Since we now have all three values as absolutes, let's rename "Intercept" to the appropriate factor level:
coefs <- mapply(function(x, y) { names(x)[1] <- y; x},
x = coefs, y = c("cyl4", "cyl6", "cyl8"), SIMPLIFY = FALSE)
coefs
#> $mtcars_4
#> cyl4 cyl6 cyl8
#> 28.65012 22.68246 20.12927
#>
#> $mtcars_6
#> cyl6 cyl4 cyl8
#> 22.68246 28.65012 20.12927
#>
#> $mtcars_8
#> cyl8 cyl4 cyl6
#> 20.12927 28.65012 22.68246
Finally, let's rearrange the order so we can compare the absolute values of all three factor levels:
coefs <- lapply(coefs, function(x) x[order(names(x))])
coefs
#> $mtcars_4
#> cyl4 cyl6 cyl8
#> 28.65012 22.68246 20.12927
#>
#> $mtcars_6
#> cyl4 cyl6 cyl8
#> 28.65012 22.68246 20.12927
#>
#> $mtcars_8
#> cyl4 cyl6 cyl8
#> 28.65012 22.68246 20.12927
We can see they are all the same. This is why the ordering of factors is arbitrary in lm
: changing the order of the factor levels gives the same numerical predictions in the end, even if the summary appears different.
TL;DR
So the answer to your question of where do you get the -2.55 if you only have the first model is find the difference between the non-intercept coefficients. In this case
(-8.520851) -(-5.967655)
#> [1] -2.553196
Alternatively, add the intercept on to the non-intercept coefficients and you can see what the intercept would be if any of the levels were baseline and you can get the coefficient for any level relative to any other by simple subtraction. That's how my function rearrange_model_factors
works.
Created on 2020-10-05 by the reprex package (v0.3.0)