0

I use patsy to build design matrix. I need to include powers of the original factors. For example, with the regression y~x1+x1^2+x2+x2^2+x2^3, I want to be able to write

patsy.dmatrix('y~x1 + x1**2 + x2 + x2**2 + x2**3', data)

where data is a dataframe that contains column y, x1, x2. But it does not seem to work at all. Any solutions?

Tom Bennett
  • 2,305
  • 5
  • 24
  • 32

2 Answers2

2

Patsy has a special interpretation of ** that it inherited from R. I've considered making it automatically do the right thing when applied to numeric factors, but haven't actually implemented it... in the mean time, there's a general method for telling patsy to switch to using the Python interpretation of operators, instead of the Patsy interpretation: you wrap your expression in I(...). So:

patsy.dmatrix('y~x1 + I(x1**2) + x2 + I(x2**2) + I(x2**3)', data)

(More detailed explanation here)

Community
  • 1
  • 1
Nathaniel J. Smith
  • 11,613
  • 4
  • 41
  • 49
0

Patsy does not seem to manage power representation (yet?). A way to get around can be found here: python stats models - quadratic term in regression

Community
  • 1
  • 1
Christian O'Reilly
  • 1,864
  • 3
  • 18
  • 33