1

I need to perform a rolling linear regression for X periods at a time. I have the following pandas dataframe:

   value
0  4354
1  7564
2  657
3  7876

I can perform a linear regression on the whole dataframe by using scipy as follows:

from scipy import stats

slope, intercept, r_value, p_value, std_err = stats.linregress(df.index, df['value'])

And then to get the linear regression line I do:

df['linreg'] = intercept + slope * df.index

But what I have been unable to figure out how to do is a rolling linear regression, for example with a 20 row rolling window.

darkpool
  • 13,822
  • 16
  • 54
  • 89
  • You probably want [`rolling_apply`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.rolling_apply.html#pandas.rolling_apply) – EdChum Sep 02 '15 at 12:29
  • Select 20 rows and call linregress on them; repeat with 20 more rows. – ev-br Sep 02 '15 at 12:48
  • @darkpool You've specified a `value` and an `index`. What are you looking to perform your linear regression on? In other words, in the standard linear form of `Y=a+bX`, what is your `X` and `Y` in your dataset? I'm guessing your `value` would be `Y` and your `index` would be `X`. Or are you looking to run your analysis on `X` with `lagged terms of X`? Or is your sample data misrepresenting your intentions, meaning that you would like to regress value on *another* value in another column? – vestland Feb 26 '19 at 19:10

1 Answers1

0

Linear regression requires the computation of five sums: Xi, Xi², Yi, Yi², Xi.Yi. You can update these in a rolling window fashion, adding new points and deducting old ones.

This works perfectly for integer numbers that do not exceed the machine representation accuracy. Otherwise you need to restart every now and then to reduce accumulation of errors.