0

The I() function in R is used to create new predictor in Linear Regression, such as X^2 for example:

lm.fit2=lm(medv∼lstat +I(lstat ^2))

A good explanation is given here (What does the capital letter "I" in R linear regression formula mean?).

I'm trying to do the linear regression in Python with the same formula and I can't seem to find the equivalent. This code works for a single variable

fit3 = smf.ols('medv~lstat', data=data).fit()
print(fit3.summary())

but if I try, the below code snippet, it obviously doesn't work correctly.

fit3 = smf.ols('medv~lstat + lstat**2', data=data).fit()
print(fit3.summary())

Trying the ^ operator also doesnt make sense as Python interprets this symbol as bitwise xor. Does anyone know if there is an equivalent of the same function I() in Python?

user42
  • 871
  • 1
  • 10
  • 28

1 Answers1

1

I found the answer, seems to be as simple as:

f = 'medv~lstat + I(lstat**2)'
fit3 = smf.ols(f, data=data).fit()
print(fit3.summary())
user42
  • 871
  • 1
  • 10
  • 28
  • `statsmodels` uses `patsy` for parsing R-like inputs. [Here's the quickstart](https://patsy.readthedocs.io/en/latest/quickstart.html) which includes the above answer as well as lots of other information. – Aaron Mar 23 '21 at 04:40