How to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column

Question

A very simple example just for understanding.

The goal is to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column.

I have the following DataFrame:

import numpy as np
import pandas as pd

s = pd.Series([1,2,3,2,1,2,3,2,1])    
df = pd.DataFrame({'DATA':s, 'POINTS':0})

df

Note: I don't even know how to format the Jupyter Notebook results in the Stackoverflow edit window, so I copy and paste the image, I beg your pardon.

The DATA column shows the observed data; the POINTS column, initialized to 0, is used to collect the output of a "rolling" function applied to DATA column, as explained in the following.

Set a window = 4

nwin = 4

Just for the example, the "rolling" function calculate the max.

Now let me use a drawing to explain what I need.

For every iteration, the rolling function calculate the maximum of the data in the window; then the POINT at the same index of the max DATA is incremented by 1.

The final result is:

Can you help me with the python code?

I really appreciate your help.
Thank you in advance for your time,
Gilberto

P.S. Can you also suggest how to copy and paste Jupyter Notebook formatted cell to Stackoverflow edit window? Thank you.

Copy the output of `print(df)` in the edit window, and format it all as code (`{}` button in the toolbar). See also [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). — IanS, Sep 23 '16 at 09:35
"For every iteration, the rolling function calculate the maximum of the data in the window; then the POINT at the same index of the max DATA is incremented by 1." - I don't understand: isn't this incrementing `POINTS` by `(df.DATA.rolling(4).max() == df.DATA).astype(int)`? It doesn't fit your output example, though. — Ami Tavory, Sep 23 '16 at 10:22
@AmiTavory, the way I understand it, the first three rolling windows have their max at index 2, so the value of `POINTS` at index 2 is incremented three times. The fourth rolling window no longer covers index 2, so the algorithm moves on, so to speak. An interesting problem, I'd say... — IanS, Sep 23 '16 at 10:39

Ami Tavory · Accepted Answer · 2016-09-23T10:57:11.737

2

IIUC the explanation by @IanS (thanks again!), you can do

In [75]: np.array([df.DATA.rolling(4).max().shift(-i) == df.DATA for i in range(4)]).T.sum(axis=1)
Out[75]: array([0, 0, 3, 0, 0, 0, 3, 0, 0])

To update the column:

In [78]: df = pd.DataFrame({'DATA':s, 'POINTS':0})

In [79]: df.POINTS += np.array([df.DATA.rolling(4).max().shift(-i) == df.DATA for i in range(4)]).T.sum(axis=1)

In [80]: df
Out[80]: 
   DATA  POINTS
0     1       0
1     2       0
2     3       3
3     2       0
4     1       0
5     2       0
6     3       3
7     2       0
8     1       0

edited Sep 23 '16 at 10:57

answered Sep 23 '16 at 09:38

Ami Tavory

74,578
11
141
185

Thank you very much @AmiTavory ! My Python knowledge is still poor, but your answer help me very much. – Gilberto Sep 23 '16 at 11:16
And thank you @IanS for help to clarify my question. – Gilberto Sep 23 '16 at 11:16

score 1 · Answer 2 · answered Sep 23 '16 at 11:43

1

import pandas as pd

s = pd.Series([1,2,3,2,1,2,3,2,1])    
df = pd.DataFrame({'DATA':s, 'POINTS':0})

df.POINTS=df.DATA.rolling(4).max().shift(-1)
df.POINTS=(df.POINTS*(df.POINTS==df.DATA)).fillna(0)

answered Sep 23 '16 at 11:43

ender85

21
3

How to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column

2 Answers2