r specific glm logistic regresion - what am I modeling?

Question

given the outcome variable in a dataframe is a factored variable, when regressing the DV onto a set of IVs, what is the model predicting? The probability that the DV is the first level of the factor? Or the second?

A related question - I know that given a numerical column of 1s and 0s, a logistic regression would model the probability of the higher order variable (i.e., value=1), so I have been attempting to recode the factor "character" variable into numerical. I am coming from a SAS background, so I am entirely to used to if var = "yes" then var_num = 1; else var_num=0;

That's clearly wrong. What's the most efficient way you have found to recode such variables?

This question appears to be off-topic because it is about statistical understanding. Try [stats.se] instead. — Gavin Simpson, Jun 30 '14 at 17:23
Actually I believe it to be directly related to R. It is unclear from the output which factor level R uses for "success" when doing logistic regression. — MrFlick, Jun 30 '14 at 17:29

score 2 · Accepted Answer · answered Jun 30 '14 at 17:27

If you have a factor value with just two levels and are using a logistic regrssion, then R will treat the first level as no event (0) and the second level as "success" (1). You can view the order of the levels with levels(dataframe$columnname).

If you want to change the reference level, then relevel will do the trick

dd$gender <- relevel(dd$gender, "male")

For example, consider data

dd<-data.frame(x=runif(50))
dd<-transform(dd,outcome=ifelse(runif(50)<x,"event","noevent"))

levels(dd$outcome)
# [1] "event"   "noevent"

with(dd, table(lessthanhalf=x<.5, outcome))
#             outcome
# lessthanhalf event noevent
#        FALSE    15       8
#        TRUE      6      21

Here we can see that increasing x values are associated with more "events". We can model this with

glm(outcome~x, dd, family=binomial)

# Call:  glm(formula = outcome ~ x, family = binomial, data = dd)
# 
# Coefficients:
# (Intercept)            x  
#       2.773       -4.990

By default, we are modeling the probability of "noevent" so as x increasing the probability of noevent decreases, we can change to model the the probability of "event" by making "noevent" the reference category

glm(relevel(outcome,"noevent")~x, dd, family=binomial)

# Call:  glm(formula = relevel(outcome, "noevent") ~ x, family = binomial, 
#     data = dd)

# Coefficients:
# (Intercept)            x  
#      -2.773        4.990

Thank you very much! This is what I was looking for. I hope others can find this because the explanation was complete and well explained. — gh0strider18, Jun 30 '14 at 17:54

r specific glm logistic regresion - what am I modeling?

1 Answers1