19

I am going nuts trying to figure this out. How can I in R, define the reference level to use in a binary logistic regression? What about the multinomial logistic regression? Right now my code is:

logistic.train.model3 <- glm(class~ x+y+z,
                         family=binomial(link=logit), data=auth, na.action = na.exclude)

my response variable is "YES" and "NO". I want to predict the probability of someone responding with "YES".

I DO NOT want to recode the variable to 0 / 1. Is there a way I can tell the model to predict "YES" ?

Thank you for your help.

blast00
  • 559
  • 2
  • 8
  • 18
  • Make class a factor - see `?factor` – user20650 Apr 25 '14 at 00:38
  • My class variable, is a factor variable. I am looking to assign the event, or the value of class that the logistic regression predicts. Right now it is predicting "NO", I want it to predict "YES". – blast00 Apr 25 '14 at 00:43
  • 1
    ok use the `I` function (as is): I(class=="yes") – user20650 Apr 25 '14 at 00:45
  • 1
    why would you do that? just multiple the coefficients by -1... Also `relevel(class, ref = 'YES')` – rawr Apr 25 '14 at 00:57
  • @user20650 can you please demonstrate where you put that function / insert that line of code? That seems to be the answer/ what I am looking for. – blast00 Apr 25 '14 at 04:26
  • @rawr I want to not change the variable values. I have tons of code that is reliant on their values, additioanlly, if I have mutlinominal regression this is a pain. – blast00 Apr 25 '14 at 05:09
  • It would be transparent if you created a new variable, inverse of `class` and use that in the model. Since it will have a new name, all the output will be clear about which variable/reference was used. – Roman Luštrik Apr 25 '14 at 07:07
  • @blast00; You can use it as `glm(I(class=="YES") ~ x+y+z`. Note while this will work for `binomial` if your outcome has more than two levels it will result in comparing one level to the rest. Defining the factor levels pre regression is best the way go here - it is only one line of code – user20650 Apr 25 '14 at 07:33
  • @rawr; apologies being thick..deleted – user20650 Apr 25 '14 at 07:36
  • user20650 - thank you here. Case closed. Post the answer. – blast00 Apr 25 '14 at 12:54

2 Answers2

22

Note that, when using auth$class <- relevel(auth$class, ref = "YES"), you are actually predicting "NO".

To predict "YES", the reference level must be "NO". Therefore, you have to use auth$class <- relevel(auth$class, ref = "NO").

It's a common mistake people do since most the time their oucome variable is a vector of 0 and 1, and people want to predict 1.

But when such a vector is considered as a factor variable, the reference level is 0 (see below) so that people effectively predict 1. Likewise, your reference level must be "NO" so that you will predict "YES".

set.seed(1234)
x1 <- sample(c(0, 1), 50, replace = TRUE)
x2 <- factor(x1)
str(x2)
#Factor w/ 2 levels "0","1": 1 2 2 2 2 2 1 1 2 2 ...You can see that reference level is 0
nghauran
  • 6,648
  • 2
  • 20
  • 29
21

Assuming you have class saved as a factor, use the relevel() function:

auth$class <- relevel(auth$class, ref = "YES")
Phil
  • 4,344
  • 2
  • 23
  • 33
smrt1119
  • 282
  • 2
  • 8
  • 12
    Note that, when using `auth$class <- relevel(auth$class, ref = "YES")`, you are actually predicting "NO", not "YES". – nghauran Aug 03 '18 at 08:09
  • 1
    if anyone doesn't have class saved as a factor, just do relevel(factor(auth$class), ref="YES") – hongpastry Dec 12 '20 at 04:08