Imagine having a large data frame consisting of two columns x
and y
. For example:
df = pd.DataFrame(
{
"x": np.linspace(0, 10, num=POINTS_NUM, endpoint=True),
"y": y
}
)
My objective is to efficiently determine the slope and intercept of linear regressions rolling on subsets of the data. What I did is using rolling
and a for
loop (the 20
is just an example):
regression_data = []
for window in df.rolling(window=int(20)):
if window.shape[0] < 20:
regression_data.append([None, None, None, None])
continue
lin_reg = linregress(window["x"], window["y"])
regression_data.append(
[window["x"].iloc[0], window["x"].iloc[-1], lin_reg.slope, lin_reg.intercept]
)
It works as I want, but, I feel this is very much non-pythonic. I am trying to figure out how to achieve this using aggregate
or apply
on the Window
resulting object. So far, I wasn't successful.
What I tried is:
def lin_reg_for_win(window):
lin_reg = linregress(window["x"], window["y"])
return [window.iloc[0], window.iloc[-1], lin_reg.slope, lin_reg.intercept]
df.rolling(2, method="table", min_periods=0).agg(lambda x: lin_reg_for_win(x))
But the returned error is ValueError: Data must be 1-dimensional
.