Pandas rolling window to return an array

Question

Here is a sample code.

df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
df['C'] = df.B.rolling(window=3)

Output:

           A         B                                       C
0 -0.108897  1.877987  Rolling [window=3,center=False,axis=0]
1 -1.276055 -0.424382  Rolling [window=3,center=False,axis=0]
2  1.578561 -1.094649  Rolling [window=3,center=False,axis=0]
3 -0.443294  1.683261  Rolling [window=3,center=False,axis=0]
4  0.674124  0.281077  Rolling [window=3,center=False,axis=0]
5  0.587773  0.697557  Rolling [window=3,center=False,axis=0]
6 -0.258038 -1.230902  Rolling [window=3,center=False,axis=0]
7 -0.443269  0.647107  Rolling [window=3,center=False,axis=0]
8  0.347187  0.753585  Rolling [window=3,center=False,axis=0]
9 -0.369179  0.975155  Rolling [window=3,center=False,axis=0]

I want my 'C' column to be an array like [0.1231, -1.132, 0.8766]. I tried using rolling apply but in vain.

Expected Output:

       A         B                 C
0 -0.108897  1.877987  []
1 -1.276055 -0.424382  []
2  1.578561 -1.094649  [-1.094649, -0.424382, 1.877987]
3 -0.443294  1.683261  [1.683261, -1.094649, -0.424382]
4  0.674124  0.281077  [0.281077, 1.683261, -1.094649]
5  0.587773  0.697557  [0.697557, 0.281077, 1.683261]
6 -0.258038 -1.230902  [-1.230902, 0.697557, 0.281077]
7 -0.443269  0.647107  [0.647107, -1.230902, 0.697557]
8  0.347187  0.753585  [0.753585, 0.647107, -1.230902]
9 -0.369179  0.975155  [0.975155, 0.753585, 0.647107]

No, this isn't possible. Each rolling window computation must return an aggregated result. If your function can't guarantee that, then you probably have to look at some other option. — cs95, Nov 25 '17 at 02:08
Thanks. Is it possible with loc/iloc/ix/others? I want to lookback a window and get the array. — revendar, Nov 25 '17 at 04:17
I'd like to see your actual function and expected output for this data. What exactly are you trying to do? Obviously, I cannot help you unless I know what your real function does. — cs95, Nov 25 '17 at 04:21
Sure. I'm trying to multiply by an X factor through the array. For eg: [ 1, 2, 3, 4] will return [ 1.5, 3, 4.5, 6] when the factor is 1.5. Im trying to provide a clear dataframe — revendar, Nov 25 '17 at 05:09
Alright. Why do you need a rolling window? How are each window computations related to each other? How do they fit into the final dataframe? Say, for a column of `[1, 2, 3, 4, 5, 6, 7, 8]` and `window_size=3`, what is your desired output? (This is very important information, I'd also like you to edit your question... thanks) — cs95, Nov 25 '17 at 05:11
Updated Question with expected output. I just need the window as an array. — revendar, Nov 25 '17 at 05:34
One thing confuses me... and that is how A and B are used in your function. From where does the result come from? Do you do this striding separately for A? Is there another column D that should be the result of B's strides? Or is there only one column in the result? — cs95, Nov 25 '17 at 05:41
Updated Question to remove the function. Now the expectation is simple. 'C' should be an array of rolling window of B. — revendar, Nov 25 '17 at 06:23
@cs95 It is no longer true, that you have to aggregate, see my answer at the bottom. — Philipp, Dec 14 '20 at 09:56

Philipp · Answer 1 · 2022-03-25T08:35:40.170

30

Since pandas 1.1 rolling objects are iterable.

For a list of lists:

df['C'] = [window.to_list() for window in df.B.rolling(window=3)]

For a Series of Series's do:

df['C'] = pd.Series(df.B.rolling(window=3))

Also checkout the rolling function for parameters.

edited Mar 25 '22 at 08:35

answered Dec 09 '20 at 11:50

Philipp

1,191
1
14
16

2

Exactly what I needed, you can replace `window.to_list()` with any list and get back a list of lists, which you can easily turn into a dataframe etc. – matthewpark319 Dec 10 '20 at 21:49
1

This is the best answer for 2021. – AliS Jan 08 '22 at 11:05
This is the best answer also for 2022. – SimAzz Oct 03 '22 at 15:21

cs95 · Accepted Answer · 2017-11-25T07:41:05.733

12

You could use np.stride_tricks:

import numpy as np
as_strided = np.lib.stride_tricks.as_strided  

df

          A         B
0 -0.272824 -1.606357
1 -0.350643  0.000510
2  0.247222  1.627117
3 -1.601180  0.550903
4  0.803039 -1.231291
5 -0.536713 -0.313384
6 -0.840931 -0.675352
7 -0.930186 -0.189356
8  0.151349  0.522533
9 -0.046146  0.507406

win = 3  # window size

# https://stackoverflow.com/a/47483615/4909087
v = as_strided(df.B, (len(df) - (win - 1), win), (df.B.values.strides * 2))

v
array([[ -1.60635669e+00,   5.10129842e-04,   1.62711678e+00],
       [  5.10129842e-04,   1.62711678e+00,   5.50902812e-01],
       [  1.62711678e+00,   5.50902812e-01,  -1.23129111e+00],
       [  5.50902812e-01,  -1.23129111e+00,  -3.13383794e-01],
       [ -1.23129111e+00,  -3.13383794e-01,  -6.75352179e-01],
       [ -3.13383794e-01,  -6.75352179e-01,  -1.89356194e-01],
       [ -6.75352179e-01,  -1.89356194e-01,   5.22532550e-01],
       [ -1.89356194e-01,   5.22532550e-01,   5.07405549e-01]])

df['C'] = pd.Series(v.tolist(), index=df.index[win - 1:])
df

          A         B                                                  C
0 -0.272824 -1.606357                                                NaN
1 -0.350643  0.000510                                                NaN
2  0.247222  1.627117  [-1.606356691642917, 0.0005101298424200881, 1....
3 -1.601180  0.550903  [0.0005101298424200881, 1.6271167809032248, 0....
4  0.803039 -1.231291  [1.6271167809032248, 0.5509028122535129, -1.23...
5 -0.536713 -0.313384  [0.5509028122535129, -1.2312911105674484, -0.3...
6 -0.840931 -0.675352  [-1.2312911105674484, -0.3133837943758246, -0....
7 -0.930186 -0.189356  [-0.3133837943758246, -0.6753521794378446, -0....
8  0.151349  0.522533  [-0.6753521794378446, -0.18935619377656243, 0....
9 -0.046146  0.507406  [-0.18935619377656243, 0.52253255045267, 0.507...

edited Nov 25 '17 at 07:41

answered Nov 25 '17 at 07:38

cs95

379,657
97
704
746

1

Oh god I was here till now https://stackoverflow.com/a/46199050/4800652, thought of answering you already posted it – Bharath M Shetty Nov 25 '17 at 07:39
@Bharath yeah... I had some help from here: https://stackoverflow.com/a/47483615/4909087 – cs95 Nov 25 '17 at 07:41
Awesome Thanks! Does this work for forward looking as well? Like get next n rows as array? – revendar Nov 25 '17 at 07:43
@revendar Hmm... I'm not 100% sure but I think the stride_tricks code is applicable. After that, you just need to figure out how to insert it into your dataframe (for example, how would you shift the data and by how much). Hope that makes sense. – cs95 Nov 25 '17 at 07:44
@cᴏʟᴅsᴘᴇᴇᴅ what do think about zipping approach? – Bharath M Shetty Nov 25 '17 at 07:53
@revendar when you see an answer that helped you vote and appreciate it,just the way you accept the answer – Bharath M Shetty Nov 25 '17 at 07:55
I'm currently using shift as i need only few values. Will try using strides. – revendar Nov 25 '17 at 09:28
@revendar Use `strides` with caution, as it is very easy to reach invalid memory addresses as no stride checking is made by numpy. – cs95 Nov 25 '17 at 09:31
This does not work if each element of the initial array is also an array! – leoschet Sep 04 '18 at 10:51
@leoschet it's not meant to. That is quite possibly the worst way anyone could store their data. It kills all hope of vectorization. – cs95 Sep 04 '18 at 20:18
From what you said, I can't see the point of this question either. Time series forecasting usually uses a rolling window, since I'm dealing with a (time series) classification problem, I wanted to do the same as the question requested. Looking for a general solution, I was looking forward a piece of code which would work both with arrays of single elements and arrays of arrays. @coldspeed – leoschet Sep 04 '18 at 22:40

score 11 · Answer 3 · answered Jan 30 '19 at 14:36

Let's using this pandas approach with a rolling apply trick:

df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
list_of_values = []
df.B.rolling(3).apply(lambda x: list_of_values.append(x.values) or 0, raw=False)
df.loc[2:,'C'] = pd.Series(list_of_values).values
df

Output:

          A         B                                                                  C
0  1.610085  0.354823                                                                NaN
1 -0.241446 -0.304952                                                                NaN
2  0.524812 -0.240972  [0.35482336179318674, -0.30495156795594963, -0.24097191924555197]
3  0.767354  0.281625   [-0.30495156795594963, -0.24097191924555197, 0.2816249674055174]
4 -0.349844 -0.533781    [-0.24097191924555197, 0.2816249674055174, -0.5337811449574766]
5 -0.174189  0.133795     [0.2816249674055174, -0.5337811449574766, 0.13379518286397707]
6  2.799437 -0.978349    [-0.5337811449574766, 0.13379518286397707, -0.9783488211443795]
7  0.250129  0.289782     [0.13379518286397707, -0.9783488211443795, 0.2897823417165459]
8 -0.385259 -0.286399    [-0.9783488211443795, 0.2897823417165459, -0.28639931887491943]
9 -0.755363 -1.010891    [0.2897823417165459, -0.28639931887491943, -1.0108913605575793]

Nice trick. It also works with `raw=True` and then you can append `x` directly without the need of casting the series into `np.array`. — mac13k, Feb 02 '20 at 15:27
I mean this line: `df.B.rolling(3).apply(lambda x: list_of_values.append(x) or 0, raw=True)` — mac13k, Feb 02 '20 at 15:28

score 5 · Answer 4 · answered Jun 15 '21 at 19:38

In newer numpy versions there is a sliding_window_view().

It provides identical to as_strided() arrays, but with more transparent syntax.

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9])
sliding_window_view(x, 3)

>>>
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

But be aware that pandas rolling will add few nans (window_size - 1) at the start because it uses padding. You can check it like this:

x.rolling(3).sum()

>>>
0     NaN
1     NaN
2     6.0
3     9.0
4    12.0
5    15.0
6    18.0
7    21.0
8    24.0
dtype: float64

sliding_window_view(x, 3).sum(axis=1)
>>>
array([ 6,  9, 12, 15, 18, 21, 24])

So real corresponding array should be:

c = np.array([[nan, nan,  1.],
              [nan,  1.,  2.],
              [ 1.,  2.,  3.],
              [ 2.,  3.,  4.],
              [ 3.,  4.,  5.],
              [ 4.,  5.,  6.],
              [ 5.,  6.,  7.],
              [ 6.,  7.,  8.],
              [ 7.,  8.,  9.]])

c.sum(axis=1)
>>>
array([nan, nan,  6.,  9., 12., 15., 18., 21., 24.])

Bharath M Shetty · Answer 5 · 2017-11-25T07:57:50.930

Perhaps zipping would also help in your case i.e

def get_list(x,m) : return list(zip(*(x[i:] for i in range(m))))

# get_list(df['B'],3) would return 

[(-1.606357, 0.0005099999999999999, 1.627117),
 (0.0005099999999999999, 1.627117, 0.5509029999999999),
 (1.627117, 0.5509029999999999, -1.231291),
 (0.5509029999999999, -1.231291, -0.313384),
 (-1.231291, -0.313384, -0.6753520000000001),
 (-0.313384, -0.6753520000000001, -0.189356),
 (-0.6753520000000001, -0.189356, 0.522533),
 (-0.189356, 0.522533, 0.507406)]

df['C'] = pd.Series(get_list(df['B'],3), index=df.index[3 - 1:])
# Little help form @coldspeed

print(df)

          A         B                                                  C
0 -0.272824 -1.606357                                                NaN
1 -0.350643  0.000510                                                NaN
2  0.247222  1.627117       (-1.606357, 0.0005099999999999999, 1.627117)
3 -1.601180  0.550903  (0.0005099999999999999, 1.627117, 0.5509029999...
4  0.803039 -1.231291          (1.627117, 0.5509029999999999, -1.231291)
5 -0.536713 -0.313384         (0.5509029999999999, -1.231291, -0.313384)
6 -0.840931 -0.675352        (-1.231291, -0.313384, -0.6753520000000001)
7 -0.930186 -0.189356        (-0.313384, -0.6753520000000001, -0.189356)
8  0.151349  0.522533         (-0.6753520000000001, -0.189356, 0.522533)
9 -0.046146  0.507406                    (-0.189356, 0.522533, 0.507406)

score 0 · Answer 6 · answered Jun 15 '21 at 20:02

0

Here is another way:

df.join(pd.concat(df['B'].rolling(window=3),axis=1).apply(lambda x: x.dropna().tolist()).reset_index(drop=True).loc[2:].rename('C'))

answered Jun 15 '21 at 20:02

rhug123

7,893
1
9
24

score 0 · Answer 7 · answered Dec 10 '22 at 14:17

    def function1(ss:pd.Series):
        ss
        if len(ss)<3:
            df.loc[ss.index.max(),'col1']='[]'
        else:
            df.loc[ss.index.max(),'col1']=str(ss.tolist())
        return 0
    
    df.B.rolling(3,1).apply(function1)
    print(df)

out:
    
           A         B                                               col1
    0 -1.274356 -1.732827                                                 []
    1  0.183624  0.137492                                                 []
    2  1.048061 -0.214123  [-1.7328268374352014, 0.13749207646596268, -0....
    3  1.718671 -2.005314  [0.13749207646596268, -0.21412311452744318, -2...
    4  0.778992  1.550759  [-0.21412311452744318, -2.005313805961139, 1.5...
    5  1.518509 -1.514495  [-2.005313805961139, 1.5507593790721148, -1.51...
    6 -0.944488 -0.321972  [1.5507593790721148, -1.514495405518893, -0.32...
    7 -1.316263  0.494681  [-1.514495405518893, -0.3219723279426748, 0.49...
    8  0.044054 -0.571182  [-0.3219723279426748, 0.49468113139453956, -0....
    9 -0.084098  1.143597  [0.49468113139453956, -0.5711819289255152, 1.1...

Pandas rolling window to return an array

7 Answers7

Linked