I have a dataframe, I would like to use rolling to extract two columns and perform regression on them(regress the first column on the second one).I have seen that people use apply
to perform one function on the data such as :
def multi_period_return(period_returns):
return np.prod(period_returns + 1) - 1
pr = data.SP500.pct_change() # period return
r = pr.rolling('360D').apply(multi_period_return)
My data is :
sp500data:
caldt,spreturn,shifted
1962-07-05,0.0056,0.0112
1962-07-06,-0.0112,0.0056
1962-07-09,0.0067,-0.0112
1962-07-10,0.011,0.0067
Considering the above code that works great, I am writing:
def firstcoef(spdf):
return sm.OLS(spdf['spreturn'],spdf['shifted']).fit().params[0]
r = sp500data.rolling(window='360D').apply(firstcoef)
But the code does not work and I get the errors as:
Traceback (most recent call last):
File "C:/Users/moham/PycharmProjects/pythonProject1/main.py", line 19, in <module>
r = sp500data.rolling(window='360D').apply(firstcoef)
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\pandas\core\window\rolling.py", line 2059, in apply
return super().apply(
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\pandas\core\window\rolling.py", line 1388, in apply
return self._apply(
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\pandas\core\window\rolling.py", line 586, in _apply
result = np.apply_along_axis(calc, self.axis, values)
File "<__array_function__ internals>", line 5, in apply_along_axis
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\numpy\lib\shape_base.py", line 379, in apply_along_axis
res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\pandas\core\window\rolling.py", line 576, in calc
return func(x, start, end, min_periods)
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\pandas\core\window\rolling.py", line 1415, in apply_func
return window_func(values, begin, end, min_periods)
File "pandas\_libs\window\aggregations.pyx", line 1441, in pandas._libs.window.aggregations.roll_generic_variable
File "C:/Users/moham/PycharmProjects/pythonProject1/main.py", line 10, in firstcoef
return sm.OLS(spdf['spreturn'],spdf['shifted']).fit().params[0]
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\pandas\core\series.py", line 882, in __getitem__
return self._get_value(key)
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\pandas\core\series.py", line 991, in _get_value
loc = self.index.get_loc(label)
File "C:\Users\moham\PycharmProjects\pythonProject1\venv\lib\site-packages\pandas\core\indexes\datetimes.py", line 605, in get_loc
raise KeyError(key) from err
KeyError: 'spreturn'
Process finished with exit code 1
I want to extract the first coefficient of the regression and have them as a data frame, then plot them. what is the proper way to revise my code?
My desired out put would be something like this:
1962-07-05, 0.09
1962-07-06, 0.011
1962-07-09, 0.02
1962-07-10, 0.03
1962-07-11, 0.04
The values are the model parameters (actually I want the first model parameter as params[0]
).