4


A very simple example just for understanding.

The goal is to calculate the values of a pandas DataFrame column depending on the results of a rolling function from another column.

I have the following DataFrame:

import numpy as np
import pandas as pd

s = pd.Series([1,2,3,2,1,2,3,2,1])    
df = pd.DataFrame({'DATA':s, 'POINTS':0})

df

DataFrame start

Note: I don't even know how to format the Jupyter Notebook results in the Stackoverflow edit window, so I copy and paste the image, I beg your pardon.

The DATA column shows the observed data; the POINTS column, initialized to 0, is used to collect the output of a "rolling" function applied to DATA column, as explained in the following.

Set a window = 4

nwin = 4

Just for the example, the "rolling" function calculate the max.

Now let me use a drawing to explain what I need.

Algo flow

For every iteration, the rolling function calculate the maximum of the data in the window; then the POINT at the same index of the max DATA is incremented by 1.

The final result is:

DataFrame end

Can you help me with the python code?

I really appreciate your help.
Thank you in advance for your time,
Gilberto

P.S. Can you also suggest how to copy and paste Jupyter Notebook formatted cell to Stackoverflow edit window? Thank you.

Gilberto
  • 813
  • 7
  • 17
  • 1
    Copy the output of `print(df)` in the edit window, and format it all as code (`{}` button in the toolbar). See also [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – IanS Sep 23 '16 at 09:35
  • "For every iteration, the rolling function calculate the maximum of the data in the window; then the POINT at the same index of the max DATA is incremented by 1." - I don't understand: isn't this incrementing `POINTS` by `(df.DATA.rolling(4).max() == df.DATA).astype(int)`? It doesn't fit your output example, though. – Ami Tavory Sep 23 '16 at 10:22
  • 1
    @AmiTavory, the way I understand it, the first three rolling windows have their max at index 2, so the value of `POINTS` at index 2 is incremented three times. The fourth rolling window no longer covers index 2, so the algorithm moves on, so to speak. An interesting problem, I'd say... – IanS Sep 23 '16 at 10:39
  • @AmiTavory you have understood correctly. – Gilberto Sep 23 '16 at 10:53
  • @IanS Thanks for the explanation. – Ami Tavory Sep 23 '16 at 10:55
  • 1
    @Gilberto I think you meant IanS. – Ami Tavory Sep 23 '16 at 10:57
  • Yes I'm very sorry. @IanS understood correctly. – Gilberto Sep 23 '16 at 11:07

2 Answers2

2

IIUC the explanation by @IanS (thanks again!), you can do

In [75]: np.array([df.DATA.rolling(4).max().shift(-i) == df.DATA for i in range(4)]).T.sum(axis=1)
Out[75]: array([0, 0, 3, 0, 0, 0, 3, 0, 0])

To update the column:

In [78]: df = pd.DataFrame({'DATA':s, 'POINTS':0})

In [79]: df.POINTS += np.array([df.DATA.rolling(4).max().shift(-i) == df.DATA for i in range(4)]).T.sum(axis=1)

In [80]: df
Out[80]: 
   DATA  POINTS
0     1       0
1     2       0
2     3       3
3     2       0
4     1       0
5     2       0
6     3       3
7     2       0
8     1       0
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
1
import pandas as pd

s = pd.Series([1,2,3,2,1,2,3,2,1])    
df = pd.DataFrame({'DATA':s, 'POINTS':0})

df.POINTS=df.DATA.rolling(4).max().shift(-1)
df.POINTS=(df.POINTS*(df.POINTS==df.DATA)).fillna(0)
ender85
  • 21
  • 3