4

I have a dataset with values in a column A representing a score. I want to calculate a column B where I can see for how long the score is equal or higher than in the last row. If the value is lower then I will set the value in column B for this row to 0.

I tried the following:

df = pd.DataFrame({'A': [140, 145,148, 150, 100, 105, 106]})

df['B'] = 0
df.loc[df['A'].diff() >= 0, 'B'] = df['B'].shift() + 1

The result is the following

    A       B
0   140     0.0
1   145     1.0
2   148     1.0
3   150     1.0
4   100     0.0
5   105     1.0
6   106     1.0

So I understand that the condition is checked for each row - but unfortunately it seems to be calculated all together, so that it does not increment value B as expected because at the time of calculation for each row the value of the row-1 .shift() is still 0.

How do I have to do it to get the following result?:

    A       B
0   140     0
1   145     1
2   148     2
3   150     3
4   100     0
5   105     1
6   106     2
fleshstorm
  • 57
  • 4
  • 1
    Does this answer your question? [Python pandas cumsum with reset everytime there is a 0](https://stackoverflow.com/questions/45964740/python-pandas-cumsum-with-reset-everytime-there-is-a-0) – G. Anderson Feb 28 '20 at 16:59
  • I came up with solution similar to @G.Anderson's link. So, this question qualifies for a dup: `m = df.A.diff().ge(0); df['B'] = m.cumsum() - m.cumsum().where(~m).ffill()` – Andy L. Feb 28 '20 at 17:56

1 Answers1

3

You need cumsum to identify the blocks, and groupby().cumcount() to enumerate each the block:

s = df['A'].diff().ge(0)
df['B'] = s.groupby((~s).cumsum()).cumcount()

Output:

     A  B
0  140  0
1  145  1
2  148  2
3  150  3
4  100  0
5  105  1
6  106  2
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74