0

Imagine having a large data frame consisting of two columns x and y. For example:

df = pd.DataFrame(
  {
    "x": np.linspace(0, 10, num=POINTS_NUM, endpoint=True), 
    "y": y
  }
)

My objective is to efficiently determine the slope and intercept of linear regressions rolling on subsets of the data. What I did is using rolling and a for loop (the 20 is just an example):

regression_data = []
for window in df.rolling(window=int(20)):
    if window.shape[0] < 20:
        regression_data.append([None, None, None, None])
        continue

    lin_reg = linregress(window["x"], window["y"])
    regression_data.append(
        [window["x"].iloc[0], window["x"].iloc[-1], lin_reg.slope, lin_reg.intercept]
    )

It works as I want, but, I feel this is very much non-pythonic. I am trying to figure out how to achieve this using aggregate or apply on the Window resulting object. So far, I wasn't successful.

What I tried is:

def lin_reg_for_win(window):
    lin_reg = linregress(window["x"], window["y"])
    return [window.iloc[0], window.iloc[-1], lin_reg.slope, lin_reg.intercept]

df.rolling(2, method="table", min_periods=0).agg(lambda x: lin_reg_for_win(x))

But the returned error is ValueError: Data must be 1-dimensional.

Dror
  • 12,174
  • 21
  • 90
  • 160
  • FWIW, [this](https://stackoverflow.com/q/32353156/671013) might be related but it didn't help me enough. – Dror Jul 08 '22 at 12:30

1 Answers1

1

You could use use the built in map function to apply the function to each window, since the result of .rolling will be an iterable:

map(lin_reg_for_win, df.rolling(2, method='table', min_periods=0))
Zach Flanders
  • 1,224
  • 1
  • 7
  • 10
  • But this is rather the same as the for-loop, right? Isn't there a more vectorized approach? – Dror Jul 08 '22 at 13:37
  • 1
    It does look more pythonic though. [Here's](https://github.com/drorata/piecewise-linear-regression/blob/2bf3842e6ef7d99b089144fe47e2f4a9e665b0f7/main_app.py#L69-L71) how I used it. – Dror Jul 08 '22 at 13:53