4

Here is the R code

model1 <- glm(wt82_71 ~ qsmk + sex + race + poly(age, 2, raw = TRUE)   + education + poly(smokeintensity, 2, raw = TRUE) + poly(smokeyrs, 2, raw = TRUE) + exercise + active + poly(wt71, 2, raw = TRUE) + qsmk:smokeintensity,data = nhefs)

In Python I wrote:

mod3 = smf.glm(formula='qsmk ~ sex + race + education + exercise + active + poly(age,2) + poly(smokeintensity,2) + poly(smokeyrs,2) + poly(wt71,2)', family=sm.families.Binomial(), data=nhefs).fit()
mod3.summary()

What would poly() be in python? Here is some comments model 1: regression on covariates, allowing for some effect modification notes:

(1) poly(x, 2) adds an orthogonal polynomial of degree 2, add the argument raw = TRUE if you want it to produce the same coefficients as would x + x^2

(2) x1*x2 enters the main effects of x1 and x2 and their product term x1:x2 enters just the product term (necessary here for smokeintensity because we want smokeintensity treated linearly in the interaction but quadratically in the main effect and thus a linear term for smokeintensity is not estimable)

(3) observations with missing values are automatically deleted

1 Answers1

4

AFAIK, poly for polynomial basis functions of continuous variables are not yet supported by patsy. The existing poly is for ordered categorical variables.

Numpy has vander functions for various polynomial bases that can be used directly in a formula.

Whether orthogonalizing on an existing dataset is useful or not is debatable. I prefer not to because then the basis functions do not change when the dataset changes.

https://github.com/pydata/patsy/issues/20 https://github.com/pydata/patsy/pull/92/files

As alternative, it is possible to directly specify power terms, see python stats models - quadratic term in regression but that will not orthogonalize.

Josef
  • 21,998
  • 3
  • 54
  • 67