I am looking for a way to create on a daily timeindexed dataframe, a rolling window over last two years, resample it every 5th Day and then run functions on the resampled dataframe.
FYI,In this case I want to run regression y~X (as per the dataframe below).
So the output will be a timeindexed series with Beta values for each day(ignoring first 2 years)
Currently I am using a row based loop, but it is extremely slow
Feel there should be easier way to accomplish this.
Thanks in advance
date_range=pd.date_range('2015-01-01','2019-12-31')
df=pd.DataFrame(np.random.rand(len(date_range),2),index=date_range,columns=['X','y'])
Code I am currently using
def rolling_stats(X,y,years_window=2):
idx=X.index
assert len(X)==len(y)
x_idx=np.isnan(X).argmin()
y_idx=np.isnan(y).argmin()
out_dates = []
out_beta = []
out_rsq = []
out_stderr = []
df=pd.DataFrame(np.nan,columns=['Beta','RSQ','StdErr'],index=idx)
for date in idx:
start_date=date-DateOffset(years=years_window)
date_range=pd.bdate_range(start_date,date,freq='5D')
try:
X_reg=X.loc[X.index.isin(date_range)]
y_reg=y.loc[y.index.isin(date_range)]
assert len(X_reg)==len(y_reg)
X_c=sm.add_constant(X_reg)
model=sm.OLS(y_reg,X_c)
result=model.fit()
df.loc[date,'RSQ']=result.rsquared
df.loc[date,'Beta']=result.params[1]
df.loc[date,'StdErr']=np.sqrt(result.mse_resid)
except Exception:
df.loc[date,'RSQ']=np.nan
df.loc[date,'Beta']=np.nan
df.loc[date,'StdErr']=np.nan
return df