3

How do I obtain the rolling values of some length n of a pandas series of value ?

For example, if I have the following:

df = pd.DataFrame({'temperature': [0, 1, 2, np.nan, 4, 2, 0.8, 4, 8.8, 7.12]})

how do I obtain the moving values of length n, i.e. something like, if n=3:

[NaN, NaN, 0], [NaN, 0, 1],..., [4, 8.8, 7.12]

EDIT: If I use pandas rolling, as:

roll = pd.Series.rolling(df, 3).mean()

then roll is the moving averages of the series. Here, I do not want the averages of every moving set of 3 values, but these sets of 3 values.

  • 1
    Can you explain a bit more? It isn't clear how you get this output. – cs95 Feb 20 '18 at 11:12
  • I do not get any output, I would like to get this output, which is the purpose of my question. Maybe it should be 'None' instead of 'NaN' for the two first rolling lists, I do not know. –  Feb 20 '18 at 11:12
  • I mean that it isn't clear _how to_ get this output. Why are there leading NaNs in the first two rows? – cs95 Feb 20 '18 at 11:12
  • How you get from your input to your (desired) output is not clear. – IanS Feb 20 '18 at 11:13
  • Please read the question, and avoid downgrading if you are not able to understand it. –  Feb 20 '18 at 11:26

3 Answers3

6

I think you need first add NaNs and then this solution:

N = 3
x = np.concatenate([[np.nan] * (N-1), df['temperature'].values])

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
print (rolling_window(x, N))
[[  nan   nan  0.  ]
 [  nan  0.    1.  ]
 [ 0.    1.    2.  ]
 [ 1.    2.     nan]
 [ 2.     nan  4.  ]
 [  nan  4.    2.  ]
 [ 4.    2.    0.8 ]
 [ 2.    0.8   4.  ]
 [ 0.8   4.    8.8 ]
 [ 4.    8.8   7.12]]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Even though the thread is old, maybe it will help someone else. I'm a beginner, but I solved user5805065's question by following procedure. Maybe, someone can make it more elegant and efficient.

  • converting Pandas series to NumPy:
rollTemperature = df['temperature'].values
  • then I've created numpy array in a for loop with some initial variables:
period = 2
stop = len(rollTemperature)
diffRoll = np.zeros(stop)

for i in range(0,stop):

    if i == 0:
        diffRoll[i] = np.nan

    elif np.mod(i,period)!=0:
        diffRoll[i] = np.nan

    else:
        diffRoll[i] = (rollTemperature[i] + rollTemperature[i-period])/2
  • than adding numpy array to existin dataFrame:
df['diffRoll'] = diffRoll 

Than the output is:

   temperature  diffRoll
0         0.00       NaN
1         1.00       NaN
2         2.00       1.0
3          NaN       NaN
4         4.00       3.0
5         2.00       NaN
6         0.80       2.4
7         4.00       NaN
8         8.80       4.8
9         7.12       NaN
0
pd.concat([df1.shift(i) for i in range(3)],axis=1).loc[:,::-1]\
    .agg(list,axis=1)

0     [nan, nan, 0.0]
1     [nan, 0.0, 1.0]
2     [0.0, 1.0, 2.0]
3     [1.0, 2.0, nan]
4     [2.0, nan, 4.0]
5     [nan, 4.0, 2.0]
6     [4.0, 2.0, 0.8]
7     [2.0, 0.8, 4.0]
8     [0.8, 4.0, 8.8]
9    [4.0, 8.8, 7.12]
dtype: object
G.G
  • 639
  • 1
  • 5