-2

given the outcome variable in a dataframe is a factored variable, when regressing the DV onto a set of IVs, what is the model predicting? The probability that the DV is the first level of the factor? Or the second?

A related question - I know that given a numerical column of 1s and 0s, a logistic regression would model the probability of the higher order variable (i.e., value=1), so I have been attempting to recode the factor "character" variable into numerical. I am coming from a SAS background, so I am entirely to used to if var = "yes" then var_num = 1; else var_num=0;

That's clearly wrong. What's the most efficient way you have found to recode such variables?

gh0strider18
  • 155
  • 1
  • 8

1 Answers1

2

If you have a factor value with just two levels and are using a logistic regrssion, then R will treat the first level as no event (0) and the second level as "success" (1). You can view the order of the levels with levels(dataframe$columnname).

If you want to change the reference level, then relevel will do the trick

dd$gender <- relevel(dd$gender, "male")

For example, consider data

dd<-data.frame(x=runif(50))
dd<-transform(dd,outcome=ifelse(runif(50)<x,"event","noevent"))

levels(dd$outcome)
# [1] "event"   "noevent"

with(dd, table(lessthanhalf=x<.5, outcome))
#             outcome
# lessthanhalf event noevent
#        FALSE    15       8
#        TRUE      6      21

Here we can see that increasing x values are associated with more "events". We can model this with

glm(outcome~x, dd, family=binomial)

# Call:  glm(formula = outcome ~ x, family = binomial, data = dd)
# 
# Coefficients:
# (Intercept)            x  
#       2.773       -4.990  

By default, we are modeling the probability of "noevent" so as x increasing the probability of noevent decreases, we can change to model the the probability of "event" by making "noevent" the reference category

glm(relevel(outcome,"noevent")~x, dd, family=binomial)

# Call:  glm(formula = relevel(outcome, "noevent") ~ x, family = binomial, 
#     data = dd)

# Coefficients:
# (Intercept)            x  
#      -2.773        4.990 
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 1
    Thank you very much! This is what I was looking for. I hope others can find this because the explanation was complete and well explained. – gh0strider18 Jun 30 '14 at 17:54