2

I am calculating the value for the Total ‘1st’ Position column (table below) and would like to do this using multiple conditions.

I want Total ‘1st’ Position to reflect the number of times a given athlete has won a race (as of a given day).

For example... see below that Steve's Total 1st Position increments by 1 when: Athlete = Steve and Position = 1. I want to do this for all athletes.

enter image description here

I have tried the following...

df['Total 1st Position'] = ((df['Position'] == '1') & (df['Athlete'] == df['Athlete'])).cumsum()

...but this only returns a running sum of the number of times df['Position'] == '1'

What am I doing wrong?

Dongs14173
  • 185
  • 4
  • 14

2 Answers2

3

You can do it this way:

df = your_file

df.loc[(df['Position'] == 1), 'firsts'] = 1
df=df.fillna(0)

df['Total 1st Position'] = (df['firsts']*df['Position']).groupby(df['Athlete']).cumsum()

If we run your data frame through this we get the following:

   Race Day Athlete  Position  firsts  Total 1st Position
0     Day 1   Steve         1     1.0                 1.0
1     Day 1    Jane         2     0.0                 0.0
2     Day 1    Bill         3     0.0                 0.0
3     Day 2    Bill         1     1.0                 1.0
4     Day 2   Steve         2     0.0                 1.0
5     Day 2    Jane         3     0.0                 0.0
6     Day 3    Jane         1     1.0                 1.0
7     Day 3    Bill         2     0.0                 1.0
8     Day 3   Steve         3     0.0                 1.0
9     Day 4   Steve         1     1.0                 2.0
10    Day 4    Jane         2     0.0                 1.0
11    Day 4    Bill         3     0.0                 1.0
Paula Livingstone
  • 1,175
  • 1
  • 12
  • 21
2

Create a temporary column to indicate wins, then use .groupby with .cumsum on that:

df['won'] = (df['Position'] == '1') * 1
df['Total 1st Position'] = df.groupby('Athlete').won.cumsum()
Ken Wei
  • 3,020
  • 1
  • 10
  • 30