Here is the R code
model1 <- glm(wt82_71 ~ qsmk + sex + race + poly(age, 2, raw = TRUE) + education + poly(smokeintensity, 2, raw = TRUE) + poly(smokeyrs, 2, raw = TRUE) + exercise + active + poly(wt71, 2, raw = TRUE) + qsmk:smokeintensity,data = nhefs)
In Python I wrote:
mod3 = smf.glm(formula='qsmk ~ sex + race + education + exercise + active + poly(age,2) + poly(smokeintensity,2) + poly(smokeyrs,2) + poly(wt71,2)', family=sm.families.Binomial(), data=nhefs).fit()
mod3.summary()
What would poly()
be in python?
Here is some comments
model 1: regression on covariates, allowing for some effect modification
notes:
(1) poly(x, 2) adds an orthogonal polynomial of degree 2, add the argument raw = TRUE if you want it to produce the same coefficients as would x + x^2
(2) x1*x2 enters the main effects of x1 and x2 and their product term x1:x2 enters just the product term (necessary here for smokeintensity because we want smokeintensity treated linearly in the interaction but quadratically in the main effect and thus a linear term for smokeintensity is not estimable)
(3) observations with missing values are automatically deleted