24

Here is a sample code.

df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
df['C'] = df.B.rolling(window=3)

Output:

           A         B                                       C
0 -0.108897  1.877987  Rolling [window=3,center=False,axis=0]
1 -1.276055 -0.424382  Rolling [window=3,center=False,axis=0]
2  1.578561 -1.094649  Rolling [window=3,center=False,axis=0]
3 -0.443294  1.683261  Rolling [window=3,center=False,axis=0]
4  0.674124  0.281077  Rolling [window=3,center=False,axis=0]
5  0.587773  0.697557  Rolling [window=3,center=False,axis=0]
6 -0.258038 -1.230902  Rolling [window=3,center=False,axis=0]
7 -0.443269  0.647107  Rolling [window=3,center=False,axis=0]
8  0.347187  0.753585  Rolling [window=3,center=False,axis=0]
9 -0.369179  0.975155  Rolling [window=3,center=False,axis=0]

I want my 'C' column to be an array like [0.1231, -1.132, 0.8766]. I tried using rolling apply but in vain.

Expected Output:

       A         B                 C
0 -0.108897  1.877987  []
1 -1.276055 -0.424382  []
2  1.578561 -1.094649  [-1.094649, -0.424382, 1.877987]
3 -0.443294  1.683261  [1.683261, -1.094649, -0.424382]
4  0.674124  0.281077  [0.281077, 1.683261, -1.094649]
5  0.587773  0.697557  [0.697557, 0.281077, 1.683261]
6 -0.258038 -1.230902  [-1.230902, 0.697557, 0.281077]
7 -0.443269  0.647107  [0.647107, -1.230902, 0.697557]
8  0.347187  0.753585  [0.753585, 0.647107, -1.230902]
9 -0.369179  0.975155  [0.975155, 0.753585, 0.647107]
revendar
  • 371
  • 2
  • 3
  • 12
  • 2
    No, this isn't possible. Each rolling window computation must return an aggregated result. If your function can't guarantee that, then you probably have to look at some other option. – cs95 Nov 25 '17 at 02:08
  • Thanks. Is it possible with loc/iloc/ix/others? I want to lookback a window and get the array. – revendar Nov 25 '17 at 04:17
  • I'd like to see your actual function and expected output for this data. What exactly are you trying to do? Obviously, I cannot help you unless I know what your real function does. – cs95 Nov 25 '17 at 04:21
  • Sure. I'm trying to multiply by an X factor through the array. For eg: [ 1, 2, 3, 4] will return [ 1.5, 3, 4.5, 6] when the factor is 1.5. Im trying to provide a clear dataframe – revendar Nov 25 '17 at 05:09
  • Alright. Why do you need a rolling window? How are each window computations related to each other? How do they fit into the final dataframe? Say, for a column of `[1, 2, 3, 4, 5, 6, 7, 8]` and `window_size=3`, what is your desired output? (This is very important information, I'd also like you to edit your question... thanks) – cs95 Nov 25 '17 at 05:11
  • Updated Question with expected output. I just need the window as an array. – revendar Nov 25 '17 at 05:34
  • One thing confuses me... and that is how A and B are used in your function. From where does the result come from? Do you do this striding separately for A? Is there another column D that should be the result of B's strides? Or is there only one column in the result? – cs95 Nov 25 '17 at 05:41
  • Updated Question to remove the function. Now the expectation is simple. 'C' should be an array of rolling window of B. – revendar Nov 25 '17 at 06:23
  • @cs95 It is no longer true, that you have to aggregate, see my answer at the bottom. – Philipp Dec 14 '20 at 09:56

7 Answers7

30

Since pandas 1.1 rolling objects are iterable.

For a list of lists:

df['C'] = [window.to_list() for window in df.B.rolling(window=3)]

For a Series of Series's do:

df['C'] = pd.Series(df.B.rolling(window=3))

Also checkout the rolling function for parameters.

Philipp
  • 1,191
  • 1
  • 14
  • 16
12

You could use np.stride_tricks:

import numpy as np
as_strided = np.lib.stride_tricks.as_strided  

df

          A         B
0 -0.272824 -1.606357
1 -0.350643  0.000510
2  0.247222  1.627117
3 -1.601180  0.550903
4  0.803039 -1.231291
5 -0.536713 -0.313384
6 -0.840931 -0.675352
7 -0.930186 -0.189356
8  0.151349  0.522533
9 -0.046146  0.507406

win = 3  # window size

# https://stackoverflow.com/a/47483615/4909087
v = as_strided(df.B, (len(df) - (win - 1), win), (df.B.values.strides * 2))

v
array([[ -1.60635669e+00,   5.10129842e-04,   1.62711678e+00],
       [  5.10129842e-04,   1.62711678e+00,   5.50902812e-01],
       [  1.62711678e+00,   5.50902812e-01,  -1.23129111e+00],
       [  5.50902812e-01,  -1.23129111e+00,  -3.13383794e-01],
       [ -1.23129111e+00,  -3.13383794e-01,  -6.75352179e-01],
       [ -3.13383794e-01,  -6.75352179e-01,  -1.89356194e-01],
       [ -6.75352179e-01,  -1.89356194e-01,   5.22532550e-01],
       [ -1.89356194e-01,   5.22532550e-01,   5.07405549e-01]])

df['C'] = pd.Series(v.tolist(), index=df.index[win - 1:])
df

          A         B                                                  C
0 -0.272824 -1.606357                                                NaN
1 -0.350643  0.000510                                                NaN
2  0.247222  1.627117  [-1.606356691642917, 0.0005101298424200881, 1....
3 -1.601180  0.550903  [0.0005101298424200881, 1.6271167809032248, 0....
4  0.803039 -1.231291  [1.6271167809032248, 0.5509028122535129, -1.23...
5 -0.536713 -0.313384  [0.5509028122535129, -1.2312911105674484, -0.3...
6 -0.840931 -0.675352  [-1.2312911105674484, -0.3133837943758246, -0....
7 -0.930186 -0.189356  [-0.3133837943758246, -0.6753521794378446, -0....
8  0.151349  0.522533  [-0.6753521794378446, -0.18935619377656243, 0....
9 -0.046146  0.507406  [-0.18935619377656243, 0.52253255045267, 0.507...
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    Oh god I was here till now https://stackoverflow.com/a/46199050/4800652, thought of answering you already posted it – Bharath M Shetty Nov 25 '17 at 07:39
  • @Bharath yeah... I had some help from here: https://stackoverflow.com/a/47483615/4909087 – cs95 Nov 25 '17 at 07:41
  • Awesome Thanks! Does this work for forward looking as well? Like get next n rows as array? – revendar Nov 25 '17 at 07:43
  • @revendar Hmm... I'm not 100% sure but I think the stride_tricks code is applicable. After that, you just need to figure out how to insert it into your dataframe (for example, how would you shift the data and by how much). Hope that makes sense. – cs95 Nov 25 '17 at 07:44
  • @cᴏʟᴅsᴘᴇᴇᴅ what do think about zipping approach? – Bharath M Shetty Nov 25 '17 at 07:53
  • @revendar when you see an answer that helped you vote and appreciate it,just the way you accept the answer – Bharath M Shetty Nov 25 '17 at 07:55
  • I'm currently using shift as i need only few values. Will try using strides. – revendar Nov 25 '17 at 09:28
  • @revendar Use `strides` with caution, as it is very easy to reach invalid memory addresses as no stride checking is made by numpy. – cs95 Nov 25 '17 at 09:31
  • This does not work if each element of the initial array is also an array! – leoschet Sep 04 '18 at 10:51
  • @leoschet it's not meant to. That is quite possibly the worst way anyone could store their data. It kills all hope of vectorization. – cs95 Sep 04 '18 at 20:18
  • From what you said, I can't see the point of this question either. Time series forecasting usually uses a rolling window, since I'm dealing with a (time series) classification problem, I wanted to do the same as the question requested. Looking for a general solution, I was looking forward a piece of code which would work both with arrays of single elements and arrays of arrays. @coldspeed – leoschet Sep 04 '18 at 22:40
11

Let's using this pandas approach with a rolling apply trick:

df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
list_of_values = []
df.B.rolling(3).apply(lambda x: list_of_values.append(x.values) or 0, raw=False)
df.loc[2:,'C'] = pd.Series(list_of_values).values
df

Output:

          A         B                                                                  C
0  1.610085  0.354823                                                                NaN
1 -0.241446 -0.304952                                                                NaN
2  0.524812 -0.240972  [0.35482336179318674, -0.30495156795594963, -0.24097191924555197]
3  0.767354  0.281625   [-0.30495156795594963, -0.24097191924555197, 0.2816249674055174]
4 -0.349844 -0.533781    [-0.24097191924555197, 0.2816249674055174, -0.5337811449574766]
5 -0.174189  0.133795     [0.2816249674055174, -0.5337811449574766, 0.13379518286397707]
6  2.799437 -0.978349    [-0.5337811449574766, 0.13379518286397707, -0.9783488211443795]
7  0.250129  0.289782     [0.13379518286397707, -0.9783488211443795, 0.2897823417165459]
8 -0.385259 -0.286399    [-0.9783488211443795, 0.2897823417165459, -0.28639931887491943]
9 -0.755363 -1.010891    [0.2897823417165459, -0.28639931887491943, -1.0108913605575793]
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    Nice trick. It also works with `raw=True` and then you can append `x` directly without the need of casting the series into `np.array`. – mac13k Feb 02 '20 at 15:27
  • 1
    I mean this line: `df.B.rolling(3).apply(lambda x: list_of_values.append(x) or 0, raw=True)` – mac13k Feb 02 '20 at 15:28
5

In newer numpy versions there is a sliding_window_view().

It provides identical to as_strided() arrays, but with more transparent syntax.

import pandas as pd
from numpy.lib.stride_tricks import sliding_window_view

x = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9])
sliding_window_view(x, 3)

>>>
array([[1, 2, 3],
       [2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7],
       [6, 7, 8],
       [7, 8, 9]])

But be aware that pandas rolling will add few nans (window_size - 1) at the start because it uses padding. You can check it like this:

x.rolling(3).sum()

>>>
0     NaN
1     NaN
2     6.0
3     9.0
4    12.0
5    15.0
6    18.0
7    21.0
8    24.0
dtype: float64

sliding_window_view(x, 3).sum(axis=1)
>>>
array([ 6,  9, 12, 15, 18, 21, 24])

So real corresponding array should be:

c = np.array([[nan, nan,  1.],
              [nan,  1.,  2.],
              [ 1.,  2.,  3.],
              [ 2.,  3.,  4.],
              [ 3.,  4.,  5.],
              [ 4.,  5.,  6.],
              [ 5.,  6.,  7.],
              [ 6.,  7.,  8.],
              [ 7.,  8.,  9.]])

c.sum(axis=1)
>>>
array([nan, nan,  6.,  9., 12., 15., 18., 21., 24.])
banderlog013
  • 2,207
  • 24
  • 33
2

Perhaps zipping would also help in your case i.e

def get_list(x,m) : return list(zip(*(x[i:] for i in range(m))))

# get_list(df['B'],3) would return 

[(-1.606357, 0.0005099999999999999, 1.627117),
 (0.0005099999999999999, 1.627117, 0.5509029999999999),
 (1.627117, 0.5509029999999999, -1.231291),
 (0.5509029999999999, -1.231291, -0.313384),
 (-1.231291, -0.313384, -0.6753520000000001),
 (-0.313384, -0.6753520000000001, -0.189356),
 (-0.6753520000000001, -0.189356, 0.522533),
 (-0.189356, 0.522533, 0.507406)]

df['C'] = pd.Series(get_list(df['B'],3), index=df.index[3 - 1:])
# Little help form @coldspeed

print(df)

          A         B                                                  C
0 -0.272824 -1.606357                                                NaN
1 -0.350643  0.000510                                                NaN
2  0.247222  1.627117       (-1.606357, 0.0005099999999999999, 1.627117)
3 -1.601180  0.550903  (0.0005099999999999999, 1.627117, 0.5509029999...
4  0.803039 -1.231291          (1.627117, 0.5509029999999999, -1.231291)
5 -0.536713 -0.313384         (0.5509029999999999, -1.231291, -0.313384)
6 -0.840931 -0.675352        (-1.231291, -0.313384, -0.6753520000000001)
7 -0.930186 -0.189356        (-0.313384, -0.6753520000000001, -0.189356)
8  0.151349  0.522533         (-0.6753520000000001, -0.189356, 0.522533)
9 -0.046146  0.507406                    (-0.189356, 0.522533, 0.507406)
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
0

Here is another way:

df.join(pd.concat(df['B'].rolling(window=3),axis=1).apply(lambda x: x.dropna().tolist()).reset_index(drop=True).loc[2:].rename('C'))
rhug123
  • 7,893
  • 1
  • 9
  • 24
0
    def function1(ss:pd.Series):
        ss
        if len(ss)<3:
            df.loc[ss.index.max(),'col1']='[]'
        else:
            df.loc[ss.index.max(),'col1']=str(ss.tolist())
        return 0
    
    df.B.rolling(3,1).apply(function1)
    print(df)

out:
    
           A         B                                               col1
    0 -1.274356 -1.732827                                                 []
    1  0.183624  0.137492                                                 []
    2  1.048061 -0.214123  [-1.7328268374352014, 0.13749207646596268, -0....
    3  1.718671 -2.005314  [0.13749207646596268, -0.21412311452744318, -2...
    4  0.778992  1.550759  [-0.21412311452744318, -2.005313805961139, 1.5...
    5  1.518509 -1.514495  [-2.005313805961139, 1.5507593790721148, -1.51...
    6 -0.944488 -0.321972  [1.5507593790721148, -1.514495405518893, -0.32...
    7 -1.316263  0.494681  [-1.514495405518893, -0.3219723279426748, 0.49...
    8  0.044054 -0.571182  [-0.3219723279426748, 0.49468113139453956, -0....
    9 -0.084098  1.143597  [0.49468113139453956, -0.5711819289255152, 1.1...
G.G
  • 639
  • 1
  • 5