0

I'm using fixed effects logistic regression in R, using the glm function. I've done some reading about interpreting interaction terms in generalized linear models. When using the log odds, the model is linear and the interaction term(s) can be interpreted in the same way as OLS regression. When the coefficients are exponentiated into odds ratios, this is no longer the case. Since my audience are more familiar with odds ratios, i'd like to report my results using that metric.

Is there a pre-cooked way of calculating interaction terms as odds ratios using R? If, not, can anyone walk me through how this should be done?

Edit 1: I'm providing a reproducible example below.

set.seed(1234)

dat <- data.frame(
    Y = factor(round(runif(60))),
    x1 = rnorm(60, 10, 3),
    sex = sample(c("male", "female"), size = 60, prob = c(.4, .6), replace = TRUE),
    population = sample(c("France", "Kenya", "Thailand"), size = 60, prob = c(.3, .45, .25), replace = TRUE)
    )

fm1 <- glm(Y ~ x1 + sex * population, family = binomial(link = "logit"), data = dat)
summary(fm1)

# odds ratios
exp(coef(fm1))

Edit 2: additional clarification.

The motivation behind my question comes from the following explanation of logistic regression interactions from the UCLA statistics site:

http://www.ats.ucla.edu/stat/stata/seminars/interaction_sem/interaction_sem.htm

My understanding, from reading this, is that the interpretation of interaction terms that have been transformed into either odds ratios or probabilities is not the same as for the same terms in log odds units. I guess I'm trying to understand if I just need to change my interpretation of the interaction term when converting to odds ratios, or whether I need to do some calculation in addition to the exponentiation?

Chris
  • 401
  • 1
  • 5
  • 10
  • Please consider including a *small* [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can better understand and more easily answer your question. – Ben Bolker Jun 23 '14 at 01:42
  • You're asking how to exponentiate the coefficients? – rawr Jun 23 '14 at 02:05
  • @rawr No, my understanding is that exponentiated coefficients for interaction terms *cannot* be interpreted in the same way as interaction terms in OLS (whereas those in the log odds can). I'm trying to calculate interaction terms in odds ratios the correct way. – Chris Jun 23 '14 at 02:14
  • `p/q = product of exp(beta_i)`, where the betas are the coefficients of the linear predictor eta (this does not depend on whether the betas come from an interaction term or not). – James King Jun 23 '14 at 02:22
  • @user3114046, so are you saying that this isn't an issue? The exponentiated interaction term in the above model would provide the effect of the difference-in-difference in the odds between `sex` and `population`? – Chris Jun 23 '14 at 02:29
  • I looked at your link for a few minutes and I can't really follow what the author is talking about. Maybe you would have better luck at http://stats.stackexchange.com/. – James King Jun 23 '14 at 02:56

1 Answers1

2

If you are talking about the interpretation of the glm() output and remain on the log-odds scale than it is exactly analogous to how you would interpret the output from lm(). In both cases it is better to talk about predictions rather than trying to separately interpret the coefficients. When you ask for a "pre-cooked way of calculating interaction terms as odds ratios using R", it is not clear what you are really requesting. Do you know of such a "pre-cooked way of calculating interaction terms" for lm() model output?

The UCLA tutorial is saying that you should ask for a method of looking at probabilities and in R regressions functions the answer is "predict":

?predict.glm

This is the set of sum of the linear predictors, i.e the sum of the Intercept and the coefficients for persons with the unique combinations of categorical features at the sample mean for x1 in this dataset:

> data.frame( expand.grid(sex=unique(dat$sex), population=unique(dat$population)), x1=mean(dat$x1))
     sex population       x1
1 female      Kenya 9.380473
2   male      Kenya 9.380473
3 female     France 9.380473
4   male     France 9.380473
5 female   Thailand 9.380473
6   male   Thailand 9.380473
> predict( fm1, newdata=data.frame( expand.grid(sex=unique(dat$sex), population=unique(dat$population)), x1=mean(dat$x1)))
         1          2          3          4          5          6 
-0.1548962  0.4757249 -0.5963092 -0.3471242  0.8477717  0.2029501 

Those could be exponentiated if odds ratios are desired, but you should then know what is in the denominator for the odds ratios. And these are the probabilities (obtained with type='response'):

> predict( fm1, newdata=data.frame( expand.grid(sex=unique(dat$sex), population=unique(dat$population)), x1=mean(dat$x1)), type="response")
        1         2         3         4         5         6 
0.4613532 0.6167379 0.3551885 0.4140800 0.7000995 0.5505641 
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks, that definitely helps. I guess i'm still confused as to how I would know what the denominator is for the odds ratios in your above example (if the first set of predicted values were exponentiated)? If I use `exp(coef(fm1))`, the 1.46 coefficient for `sexmale:populationKenya` tells me (I think) that there is 46% difference in the difference of the odds between males-females in France and Kenya. I don't get the same value from the odds-ratio predictions (1.61 - 0.86) - (0.71 - 0.55) = 0.59 – Chris Jun 24 '14 at 01:30
  • Right. The baseline (for coefficients) is the mean value of x1 within the composite factor reference category or categories, in this case female:France due to teh lexical default ordering of factor levels. – IRTFM Jun 24 '14 at 01:37
  • It makes sense to me that female:France would be the default baseline. I'm still not clear how that baseline [exp(-0.596) = 0.551] relates to the other odds-ratios? Do I have to perform some sort of conversion? – Chris Jun 24 '14 at 02:11