Python Pandas: Cumulative Sum based on multiple conditions

Question

I am calculating the value for the Total ‘1st’ Position column (table below) and would like to do this using multiple conditions.

I want Total ‘1st’ Position to reflect the number of times a given athlete has won a race (as of a given day).

For example... see below that Steve's Total 1st Position increments by 1 when: Athlete = Steve and Position = 1. I want to do this for all athletes.

I have tried the following...

df['Total 1st Position'] = ((df['Position'] == '1') & (df['Athlete'] == df['Athlete'])).cumsum()

...but this only returns a running sum of the number of times df['Position'] == '1'

What am I doing wrong?

Can you post your dataframe in text format please? – cs95 Nov 09 '17 at 01:01 — cs95, Nov 09 '17 at 01:01

score 3 · Answer 1 · answered Nov 09 '17 at 02:50

You can do it this way:

df = your_file

df.loc[(df['Position'] == 1), 'firsts'] = 1
df=df.fillna(0)

df['Total 1st Position'] = (df['firsts']*df['Position']).groupby(df['Athlete']).cumsum()

If we run your data frame through this we get the following:

   Race Day Athlete  Position  firsts  Total 1st Position
0     Day 1   Steve         1     1.0                 1.0
1     Day 1    Jane         2     0.0                 0.0
2     Day 1    Bill         3     0.0                 0.0
3     Day 2    Bill         1     1.0                 1.0
4     Day 2   Steve         2     0.0                 1.0
5     Day 2    Jane         3     0.0                 0.0
6     Day 3    Jane         1     1.0                 1.0
7     Day 3    Bill         2     0.0                 1.0
8     Day 3   Steve         3     0.0                 1.0
9     Day 4   Steve         1     1.0                 2.0
10    Day 4    Jane         2     0.0                 1.0
11    Day 4    Bill         3     0.0                 1.0

Ken Wei · Answer 2 · 2017-11-09T02:43:08.040

2

Create a temporary column to indicate wins, then use .groupby with .cumsum on that:

df['won'] = (df['Position'] == '1') * 1
df['Total 1st Position'] = df.groupby('Athlete').won.cumsum()

edited Nov 09 '17 at 02:43

answered Nov 09 '17 at 02:35

Ken Wei

3,020
1
10
30

Thanks for this :-) How as a next step could I count the respective number of times each runner raced, on a cumulative basis? – Dongs14173 Nov 09 '17 at 20:02
Use `.cumcount()` instead: `df.groupby('Athlete').cumcount()` – Ken Wei Nov 10 '17 at 01:58

Python Pandas: Cumulative Sum based on multiple conditions

2 Answers2

Linked