160

How can I tell R to use a certain level as reference if I use binary explanatory variables in a regression?

It's just using some level by default.

lm(x ~ y + as.factor(b)) 

with b {0, 1, 2, 3, 4}. Let's say I want to use 3 instead of the zero that is used by R.

An economist
  • 1,301
  • 1
  • 15
  • 35
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
  • 14
    You should do the data processing step outside of the model formula/fitting. When creating the factor from `b` you can specify the ordering of the levels using `factor(b, levels = c(3,1,2,4,5))`. Do this in a data processing step outside the `lm()` call though. My answer below uses the `relevel()` function so you can create a factor and then shift the reference level around to suit as you need to. – Gavin Simpson Oct 06 '10 at 12:14
  • 1
    I reworded your question. You're actually after changing the reference level, not leaving one out. – Joris Meys Oct 06 '10 at 12:39
  • thx for rewording my question. Indeed, relevel() was what I was looking for. Thx for the detailed answer and the example though. I am not sure if the linear-regression tag is a bit misleading because this applies to all kinds of regression using dummy explanatories... – Matt Bannert Oct 07 '10 at 08:52

6 Answers6

198

See the relevel() function. Here is an example:

set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))
head(DF)
str(DF)

m1 <- lm(y ~ x + b, data = DF)
summary(m1)

Now alter the factor b in DF by use of the relevel() function:

DF <- within(DF, b <- relevel(b, ref = 3))
m2 <- lm(y ~ x + b, data = DF)
summary(m2)

The models have estimated different reference levels.

> coef(m1)
(Intercept)           x          b2          b3          b4          b5 
  3.2903239   1.4358520   0.6296896   0.3698343   1.0357633   0.4666219 
> coef(m2)
(Intercept)           x          b1          b2          b4          b5 
 3.66015826  1.43585196 -0.36983433  0.25985529  0.66592898  0.09678759
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
83

I know this is an old question, but I had a similar issue and found that:

lm(x ~ y + relevel(b, ref = "3")) 

does exactly what you asked.

Yan Alperovych
  • 1,039
  • 7
  • 5
  • 5
    This was a big help! Only solution that included a way to do it within the lm() command which was exactly what I needed. Thanks! – cparmstrong Jan 11 '18 at 18:20
  • 9
    This is a very flexible way of working with factors. I like the fact that I can combine it with `as.factor()` if needed, for instance by using `...+relevel(as.factor(mycol), ref = "myref")+...` – Peter Dec 11 '18 at 12:47
  • This is by far the best solution here! I love it. – Sam Asin Sep 16 '20 at 19:17
40

Others have mentioned the relevel command which is the best solution if you want to change the base level for all analyses on your data (or are willing to live with changing the data).

If you don't want to change the data (this is a one time change, but in the future you want the default behavior again), then you can use a combination of the C (note uppercase) function to set contrasts and the contr.treatments function with the base argument for choosing which level you want to be the baseline.

For example:

lm( Sepal.Width ~ C(Species,contr.treatment(3, base=2)), data=iris )
Hack-R
  • 22,422
  • 14
  • 75
  • 131
Greg Snow
  • 48,497
  • 6
  • 83
  • 110
35

The relevel() command is a shorthand method to your question. What it does is reorder the factor so that whatever is the ref level is first. Therefore, reordering your factor levels will also have the same effect but gives you more control. Perhaps you wanted to have levels 3,4,0,1,2. In that case...

bFactor <- factor(b, levels = c(3,4,0,1,2))

I prefer this method because it's easier for me to see in my code not only what the reference was but the position of the other values as well (rather than having to look at the results for that).

NOTE: DO NOT make it an ordered factor. A factor with a specified order and an ordered factor are not the same thing. lm() may start to think you want polynomial contrasts if you do that.

dpel
  • 1,954
  • 1
  • 21
  • 31
John
  • 23,360
  • 7
  • 57
  • 83
  • 3
    Polynomial contrasts, not a polynomial regression. – hadley Oct 06 '10 at 13:31
  • Is there a way to set the reference level at the same time that you define the factor, rather than in a subsequent call to relevel? – David Bruce Borenstein Oct 18 '16 at 15:11
  • For some reason R is treating my variable as being ordered (it is just a bunch of strings), any idea of how to fix this? – Mark Jul 08 '21 at 19:04
  • It's best to ask new questions separately. You can link to this one and say you are expending it but it's best not to hide new stuff in the comments. – John Jul 09 '21 at 20:58
12

You can also manually tag the column with a contrasts attribute, which seems to be respected by the regression functions:

contrasts(df$factorcol) <- contr.treatment(levels(df$factorcol),
   base=which(levels(df$factorcol) == 'RefLevel'))
Harlan
  • 18,883
  • 8
  • 47
  • 56
3

For those looking for a dplyr/tidyverse version. Building on Gavin Simpson solution:

# Create DF
set.seed(123)
x <- rnorm(100)
DF <- data.frame(x = x,
                 y = 4 + (1.5*x) + rnorm(100, sd = 2),
                 b = gl(5, 20))

# Change reference level
DF = DF %>% mutate(b = relevel(b, 3))

m2 <- lm(y ~ x + b, data = DF)
summary(m2)
Gorka
  • 3,555
  • 1
  • 31
  • 37
  • I'm confused why you put "If the variable is a factor" where you did... this is necessary whether you use `relevel()` or `forcats::fct_relevel()` – Gregor Thomas Oct 24 '19 at 13:30
  • You are correct, thanks! I added "you can also use", because, afaik, fct_relevel only works with factors. – Gorka Oct 25 '19 at 14:32
  • 2
    `relevel` only works with factors. `fct_relevel` only works with factors. There isn't any difference between the functions except the name, AFAIK. Saying "If the variable is a factor you can also use `fct_relevel`" implies that if the variable is *not a factor* you could use `relevel`, but that is not true. – Gregor Thomas Oct 25 '19 at 14:38