0

I am building a logistic regression model using a continuous variable with a factor variable with 12 distinct levels (months). I am interested in the interaction effects by month.

glm(formula = PQR.dep ~ multi.month.data * Month, family = binomial, 
data = training)

When I review the output, however, the initial factor value (January) seems to be implicit.

How do I either 1) explicitly show the value of that factor, or 2) determine what the coefficient and Pr(>|z|) is?

Coefficients:
                           Estimate Std. Error z value            Pr(>|z|)    
(Intercept)                -1.32310    0.16057   -8.24 <0.0000000000000002 ***
multi.month.data            -0.08626    0.39769   -0.22                0.83    
Month02Feb                  0.05221    0.22231    0.23                0.81    
Month03Mar                 -0.17425    0.22824   -0.76                0.45    
Month04Apr                  0.06336    0.22680    0.28                0.78    
.
.
.  
Month12Dec                   0.05221    0.22231    0.23                0.81
multi.month.data:Month02Feb  0.49568    0.51903    0.96                0.34    
multi.month.data:Month03Mar  0.44301    0.57446    0.77                0.44    
multi.month.data:Month04Apr  0.88472    0.60063    1.47                0.14    
.
.
.  
multi.month.data:Month12Dec  0.88472    0.60063    1.47                0.14

In the example above, how do I determine the value of Month01Jan and multi.month.data:Month01Jan?

Dino Fire
  • 419
  • 2
  • 6
  • 9
  • Very similar to http://stackoverflow.com/questions/7337761/linear-regression-na-estimate-just-for-last-coefficient/7341074#7341074 (perhaps a duplicate) – Ben Bolker Nov 05 '13 at 16:30

2 Answers2

1

When you have a factor, R assumes that one of the levels (in this case Month01Jan) has a coefficient and Pr(>|z|) of 0. What you are seeing in the output can be thought of as the effect of a given month compared to Month01Jan. Similarly, the interaction term for that month is also 0.

Christopher Louden
  • 7,540
  • 2
  • 26
  • 29
0

If you go back to logistic regression basics, it is possible to construct an estimate for the probability for the baseline level (month=Jan) using only the intercept and the proportions of subjects in the lowest category, but with R, it's far easier to use the predict function.

mod1 <- glm(formula = PQR.dep ~ multi.month.data * Month, 
                  family = binomial, data = training)
predict(mod1, 
       newdata=data.frame(Month=`01Jan`, 
                        multi.month.data = with(training,
                                         seq(min(multi.month.data), 
                                             max(multi.month.data),
                                             length=10))
         type="response" )

(I'm taking an educated guess at what your value for the baseline 'Month' level might be,)

IRTFM
  • 258,963
  • 21
  • 364
  • 487