0

I am currently facing the issue that R automatically defines certain reference levels of factors for linear regressions that I cannot change.

For illustration, I am using the solution that was posted by Gavin Simpson in another question (How to force R to use a specified factor level as reference in a regression?).

When I am using the following code:

set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))
head(DF)
str(DF)

m1 <- lm(y ~ x + b, data = DF)
summary(m1)

R uses the fifth level of 'b' as baseline:

Call:
lm(formula = y ~ x + b, data = DF)

Residuals:
   Min     1Q Median     3Q    Max 
-3.974 -1.301 -0.164  1.053  6.091 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.7907     0.1962  19.323  < 2e-16 ***
x             1.4359     0.2189   6.561 2.89e-09 ***
b1           -0.5004     0.3905  -1.281    0.203    
b2            0.1293     0.3916   0.330    0.742    
b3           -0.1305     0.3904  -0.334    0.739    
b4            0.5354     0.3931   1.362    0.176    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.952 on 94 degrees of freedom
Multiple R-squared:  0.3243,    Adjusted R-squared:  0.2883 
F-statistic: 9.022 on 5 and 94 DF,  p-value: 4.954e-07

I get exactly the same result when I try to relevel with the following code:

DF <- within(DF, b <- relevel(b, ref = 3))
m2 <- lm(y ~ x + b, data = DF)
summary(m2)

So coefficients for m1 and m2 are both:

coef(m1)
(Intercept)           x          b1          b2          b3          b4 
  3.7907058   1.4358520  -0.5003818   0.1293078  -0.1305475   0.5353815

I have no idea how to change this and why my R behaves in such a way. I am using RStudio Apricot Nasturtium" (aee44535, 2020-09-17) for macOS on macOS Catalina 10.15.7

Session Info:

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] car_3.0-10        carData_3.0-4     rcompanion_2.3.25 tidyr_1.1.2       ez_4.4-0          ggplot2_3.3.2     readr_1.4.0       dplyr_1.0.2      

loaded via a namespace (and not attached):
 [1] splines_4.0.3      assertthat_0.2.1   expm_0.999-5       statmod_1.4.35     gld_2.6.2          lmom_2.8           stats4_4.0.3       coin_1.3-1         cellranger_1.1.0   pillar_1.4.6      
[11] lattice_0.20-41    glue_1.4.2         minqa_1.2.4        colorspace_1.4-1   sandwich_3.0-0     Matrix_1.2-18      plyr_1.8.6         pkgconfig_2.0.3    haven_2.3.1        EMT_1.1           
[21] purrr_0.3.4        mvtnorm_1.1-1      scales_1.1.1       openxlsx_4.2.2     rootSolve_1.8.2.1  rio_0.5.16         lme4_1.1-23        tibble_3.0.3       mgcv_1.8-33        generics_0.0.2    
[31] ellipsis_0.3.1     TH.data_1.0-10     pacman_0.5.1       withr_2.3.0        cli_2.0.2          survival_3.2-7     magrittr_1.5       crayon_1.3.4       readxl_1.3.1       fansi_0.4.1       
[41] nlme_3.1-149       MASS_7.3-53        forcats_0.5.0      foreign_0.8-80     class_7.3-17       tools_4.0.3        data.table_1.13.2  hms_0.5.3          lifecycle_0.2.0    matrixStats_0.57.0
[51] multcomp_1.4-14    stringr_1.4.0      Exact_2.1          munsell_0.5.0      zip_2.1.1          compiler_4.0.3     e1071_1.7-4        multcompView_0.1-8 rlang_0.4.8        grid_4.0.3        
[61] nloptr_1.2.2.2     rstudioapi_0.11    boot_1.3-25        DescTools_0.99.38  gtable_0.3.0       codetools_0.2-16   abind_1.4-5        curl_4.3           reshape2_1.4.4     R6_2.4.1          
[71] zoo_1.8-8          lubridate_1.7.9    utf8_1.1.4         nortest_1.0-4      libcoin_1.0-6      modeltools_0.2-23  stringi_1.5.3      parallel_4.0.3     Rcpp_1.0.5         vctrs_0.3.4       
[81] tidyselect_1.1.0   lmtest_0.9-38     
statleo
  • 81
  • 1
  • 7
  • Your RStudio version is irrelevant. The output of `sessionInfo()` would be more useful. Anyway, I can't reproduce your output with your code. It shows what I would expect: `b1` is used as reference level and, after `relevel`, `b3` is used as reference level. – Roland Oct 26 '20 at 09:05
  • What is the output of `options("contrasts")`? If you set the default to `"contr.SAS"`, your output would make sense (although the order of the `b` coefficients would be different after releveling). – Roland Oct 26 '20 at 09:06
  • Thanks Roland, I added the session info to the initial question; Output of options("contrasts") is: $contrasts [1] "contr.sum" "contr.poly" – statleo Oct 26 '20 at 09:30
  • I added "options(contrasts=c("contr.sum","contr.poly"))" because of an ANOVA I am also calculating and that should be corrected for unequal sub-sample sizes before the lm is performed; that's potentially the issue then? – statleo Oct 26 '20 at 09:31
  • 1
    OK? So your `lm` calls use sum to zero contrasts. There is no reference level then. If you want a reference level, you'll need to change back to the default treatment contrasts. – Roland Oct 26 '20 at 09:33

0 Answers0