1

I'm interested in computing a statistic over a rolling window. The statistic will be computed over multiple columns. Here is a toy example calculating regression coefficients over a rolling window.

def regression_coef(df):
    if df.shape[0]==0:
        return np.array([np.NaN, np.NaN])
    y = df.y.values
    X = df.drop('y',axis = 1).values
    reg = LinearRegression().fit(X,y).coef_.round(2)
    return reg

time = np.arange(5,3605,5)
x = np.random.normal(size = time.size)
z = np.random.normal(size = time.size)
y = 2*x+z + np.random.normal(size = time.size) 
df = pd.DataFrame({'x':x, 'z':z, 'y':y}, index = pd.to_datetime(time, unit ='s'))

When I call df.rolling('20 T').apply(regression_coef) I get the following error: AttributeError: 'numpy.ndarray' object has no attribute 'y'. This leads me to believe that df.rolling is computes statistics over the individual columns, rather than finding all observations within the 20 minute window.

How can I achieve what I want? That is to say, how can I compute regression_coef in a rolling window? In particular, I'm interested if this can be solved for use with offsets and with the existing pandas API.

Demetri Pananos
  • 6,770
  • 9
  • 42
  • 73
  • Possible duplicate of [How to invoke pandas.rolling.apply with parameters from multiple column?](https://stackoverflow.com/questions/38878917/how-to-invoke-pandas-rolling-apply-with-parameters-from-multiple-column) – tvgriek Apr 16 '19 at 20:15
  • @tvgriek this problem uses offsets rather than number of observations. I think it is distinct from the question you linked. – Demetri Pananos Apr 16 '19 at 21:46
  • have you tried the custom roll function defined in there? I have checked the rolling behavior, it stacks all the columns and goes over all values. total length: 3x 720. You could implement a custom function that processes a stacked column. – tvgriek Apr 16 '19 at 21:53
  • @tvgriek I've tried the function in that answer. It doesn't work with offsets. I'm capable of writing my own function to solve this, but I'd like to know if it is possible with the existing pandas API. – Demetri Pananos Apr 16 '19 at 22:00
  • @Demetri Pananos Have you find a solution for this? I have a similar question that requires all columns of `df` to be passed to the function. Hoping for Pandas API based solution. see:[link] (https://stackoverflow.com/questions/57345510/rolling-apply-on-custom-function-that-requires-multiple-columns-of-dataframe-to?noredirect=1#comment101184613_57345510) – Balki Aug 05 '19 at 10:00
  • @Balki No I haven't found a satisfactory answer. – Demetri Pananos Aug 05 '19 at 16:12

0 Answers0