3

In R you can essentially write model='Lottery ~ (Literacy + Wealth + Region)^k' and get every k-way combination of those variables.

statsmodels supports some R style OLS regressions but they don't seem to support the ^k syntax. I have a large dataset, large enough where it is prohibitive to the practice of manually trying combinations of variables, and am essentially looking for a way to automate the interaction effect search.

Josef
  • 21,998
  • 3
  • 54
  • 67
LMGagne
  • 1,636
  • 6
  • 24
  • 47
  • great question. I'm sure you've looked at this : https://www.statsmodels.org/devel/example_formulas.html#multiplicative-interactions – parsethis Jun 09 '20 at 21:17
  • @parsethis yes! That's actually part of why I decided to post. statsmodels doesn't have any documentation to indicate they support this functionality (I also tried a few different ways and got a syntax error), but I'm hoping that someone can either prove me wrong or point me in the direction of a package that does support this functionality, regardless of exact syntax. – LMGagne Jun 09 '20 at 22:02

1 Answers1

1

Formulas are handled by patsy and not by statsmodels directly.

According to patsy documentation using power (a + b + c + d) ** 3 works for interaction effects of categorical variables.

See section for ** in https://patsy.readthedocs.io/en/latest/formulas.html#the-formula-language

Aside: power in Python is ** and not ^

Josef
  • 21,998
  • 3
  • 54
  • 67