Rolling Regression Estimation in Python dataframe

Question

I have a dataframe like this:

           Date         Y         X1         X2          X3
22   2004-05-12  9.348158e-09  0.000081  0.000028     0.000036   
23   2004-05-13  9.285989e-09  0.000073  0.000081     0.000097   
24   2004-05-14  9.732308e-09  0.000085  0.000073     0.000096   
25   2004-05-17  2.235977e-08  0.000089  0.000085     0.000099   
26   2004-05-18  2.792661e-09  0.000034  0.000089     0.000150   
27   2004-05-19  9.745323e-09  0.000048  0.000034     0.000053 

......

1000   2004-05-20  1.835462e-09  0.000034  0.000048     0.000099   
1001   2004-05-21  3.529089e-09  0.000037  0.000034     0.000043   
1002   2004-05-24  3.453047e-09  0.000043  0.000037     0.000059   
1003   2004-05-25  2.963131e-09  0.000038  0.000043     0.000059   
1004   2004-05-26  1.390032e-09  0.000029  0.000038     0.000054

I want to run a rolling 100-day window OLS regression estimation, which is:

First for the 101st row, I run a regression of Y-X1,X2,X3 using the 1st to 100th rows, and estimate Y for the 101st row;

Then for the 102nd row, I run a regression of Y-X1,X2,X3 using the 2nd to 101st rows, and estimate Y for the 102nd row;

Then for the 103rd row, I run a regression of Y-X1,X2,X3 using the 2nd to 101st rows, and estimate Y for the 103rd row;

......

Until the last row.

How to do this？

The version is： 0.18.0 – Cofeinnie Bonda Aug 22 '16 at 22:44 — Cofeinnie Bonda, Aug 22 '16 at 22:44

score 27 · Accepted Answer · answered Aug 22 '16 at 22:55

27

model = pd.stats.ols.MovingOLS(y=df.Y, x=df[['X1', 'X2', 'X3']], 
                               window_type='rolling', window=100, intercept=True)
df['Y_hat'] = model.y_predict

answered Aug 22 '16 at 22:55

Alexander

105,104
32
201
196

Thanks that's great! I also want to do exactly the same thing with AR(1) time series regression, using Y. Is there something similar to MovingOLS there? – Cofeinnie Bonda Aug 22 '16 at 23:33
BTW, I'm using statsmodels.tsa.ar_model.AR for AR(1) regression. – Cofeinnie Bonda Aug 22 '16 at 23:42
Not sure for ARMA models. – Alexander Aug 23 '16 at 00:02
@Alexander, how about if we want to print the `OLS regression` summary? I tried `model.summary()` but it doesn't seem to be working. – Desta Haileselassie Hagos Jun 23 '17 at 19:51
1

`print(model.summary)` – Alexander Jun 23 '17 at 20:02
Thank you. I got `R-squared: -inf Adj R-squared: -inf Rmse: 0.0000`. It doesn't sound my model is right though – Desta Haileselassie Hagos Jun 23 '17 at 21:49
`plt.plot(x, y, "o")`? Let me try that – Desta Haileselassie Hagos Jun 23 '17 at 21:54
1

`df.plot.scatter(x='x', y='y', s=0.1)` – Alexander Jun 23 '17 at 23:40
7

this functionality seems to be deprecated, is there a replacement in any of the other packages? – Lost1 Aug 26 '17 at 21:22
1

@Lost1 https://stackoverflow.com/questions/44707384/python-pandas-has-no-attribute-ols-error-rolling-ols/44708365#44708365 – Alexander Aug 26 '17 at 21:25
5

`x.rolling(window=60).apply(sm.ols)` [SOURCE](https://stackoverflow.com/a/39704930/8560382) – chrisckwong821 Sep 06 '17 at 04:16

user3226167 · Answer 2 · 2021-02-23T03:10:43.627

8

statsmodels 0.11.0 added RollingOLS (Jan2020)

from statsmodels.regression.rolling import RollingOLS

#add constant column to regress with intercept
df['const'] = 1

#fit
model = RollingOLS(endog =df['Y'].values , exog=df[['const','X1','X2','X3']],window=20)
rres = model.fit()
rres.params.tail() #look at last few intercept and coef

Or use R-style regression formula

model = RollingOLS.from_formula('Y ~ X1 + X2 + X3' , data = df, window=20)
rres = model.fit()
rres.params.tail()

edited Feb 23 '21 at 03:10

answered Apr 03 '20 at 03:51

user3226167

3,131
2
30
34

Is there any way to overlap the window? – guy May 15 '20 at 23:25
@guy I think they are overlapped. – user3226167 May 19 '20 at 01:00

score 5 · Answer 3 · answered Sep 14 '18 at 12:37

I also needed to do some rolling regression, and encountered the issue of pandas depreciated function in the pandas.ols. Below, is my work-around

Basically, I use create an empty numpy array first, then use numpy polyfit to generate the regression values in a for-loop. Then I add the numpy arrays into the panda dataframe. Hope that helps the community!

data = pd.DataFrame(x_data, y_data)

regression = np.zeros((len(data.index),2)) #set the regression numpy array empty first
for row in range(0, len(data.index), 1):
    y = data.y_data[row: row + 300]
    x = data.x_data[row: row + 300]
    regression[row] = np.polyfit(x, y, 1)

data['beta'] = regression[:,0]
data['alpha'] = regression[:,1]

Nice approach but I think that there is an issue. The row will be out of range since you ask to return data within [row : row + 300]. I think that in the for loop look you need to iterate up to `len(data.index) - 300` — George Pipis, Jan 28 '21 at 12:33

score 0 · Answer 4 · answered Sep 10 '21 at 14:17

I came here looking for an easy solution with pandas, but did not find quite what I was looking for. Here is my solution simply using lists and a for loop, it is likely not the fastest, but I found it very simple:

from scipy.stats import linregress
import numpy as np

measurement_index = [x for x in range(1,21)]
value = [x**2 for x in measurement_index]

for idx in range(0, len(value)):
    if idx > 3:
        window_value = (value[idx-3:idx])
        window_index = (measurement_index[idx-3:idx])
        slope = linregress(np.array(window_index),np.array(window_value)).slope
        print(slope)

Rolling Regression Estimation in Python dataframe

4 Answers4

Linked