I am currently facing the issue that R automatically defines certain reference levels of factors for linear regressions that I cannot change.
For illustration, I am using the solution that was posted by Gavin Simpson in another question (How to force R to use a specified factor level as reference in a regression?).
When I am using the following code:
set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
y = 4 + (1.5*x) + rnorm(100, sd = 2),
b = gl(5, 20))
head(DF)
str(DF)
m1 <- lm(y ~ x + b, data = DF)
summary(m1)
R uses the fifth level of 'b' as baseline:
Call:
lm(formula = y ~ x + b, data = DF)
Residuals:
Min 1Q Median 3Q Max
-3.974 -1.301 -0.164 1.053 6.091
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7907 0.1962 19.323 < 2e-16 ***
x 1.4359 0.2189 6.561 2.89e-09 ***
b1 -0.5004 0.3905 -1.281 0.203
b2 0.1293 0.3916 0.330 0.742
b3 -0.1305 0.3904 -0.334 0.739
b4 0.5354 0.3931 1.362 0.176
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.952 on 94 degrees of freedom
Multiple R-squared: 0.3243, Adjusted R-squared: 0.2883
F-statistic: 9.022 on 5 and 94 DF, p-value: 4.954e-07
I get exactly the same result when I try to relevel with the following code:
DF <- within(DF, b <- relevel(b, ref = 3))
m2 <- lm(y ~ x + b, data = DF)
summary(m2)
So coefficients for m1 and m2 are both:
coef(m1)
(Intercept) x b1 b2 b3 b4
3.7907058 1.4358520 -0.5003818 0.1293078 -0.1305475 0.5353815
I have no idea how to change this and why my R behaves in such a way. I am using RStudio Apricot Nasturtium" (aee44535, 2020-09-17) for macOS on macOS Catalina 10.15.7
Session Info:
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] car_3.0-10 carData_3.0-4 rcompanion_2.3.25 tidyr_1.1.2 ez_4.4-0 ggplot2_3.3.2 readr_1.4.0 dplyr_1.0.2
loaded via a namespace (and not attached):
[1] splines_4.0.3 assertthat_0.2.1 expm_0.999-5 statmod_1.4.35 gld_2.6.2 lmom_2.8 stats4_4.0.3 coin_1.3-1 cellranger_1.1.0 pillar_1.4.6
[11] lattice_0.20-41 glue_1.4.2 minqa_1.2.4 colorspace_1.4-1 sandwich_3.0-0 Matrix_1.2-18 plyr_1.8.6 pkgconfig_2.0.3 haven_2.3.1 EMT_1.1
[21] purrr_0.3.4 mvtnorm_1.1-1 scales_1.1.1 openxlsx_4.2.2 rootSolve_1.8.2.1 rio_0.5.16 lme4_1.1-23 tibble_3.0.3 mgcv_1.8-33 generics_0.0.2
[31] ellipsis_0.3.1 TH.data_1.0-10 pacman_0.5.1 withr_2.3.0 cli_2.0.2 survival_3.2-7 magrittr_1.5 crayon_1.3.4 readxl_1.3.1 fansi_0.4.1
[41] nlme_3.1-149 MASS_7.3-53 forcats_0.5.0 foreign_0.8-80 class_7.3-17 tools_4.0.3 data.table_1.13.2 hms_0.5.3 lifecycle_0.2.0 matrixStats_0.57.0
[51] multcomp_1.4-14 stringr_1.4.0 Exact_2.1 munsell_0.5.0 zip_2.1.1 compiler_4.0.3 e1071_1.7-4 multcompView_0.1-8 rlang_0.4.8 grid_4.0.3
[61] nloptr_1.2.2.2 rstudioapi_0.11 boot_1.3-25 DescTools_0.99.38 gtable_0.3.0 codetools_0.2-16 abind_1.4-5 curl_4.3 reshape2_1.4.4 R6_2.4.1
[71] zoo_1.8-8 lubridate_1.7.9 utf8_1.1.4 nortest_1.0-4 libcoin_1.0-6 modeltools_0.2-23 stringi_1.5.3 parallel_4.0.3 Rcpp_1.0.5 vctrs_0.3.4
[81] tidyselect_1.1.0 lmtest_0.9-38