1

For the following df, I want to calculate the cumulative sum of the column Inst_Dist and save as Cumu_Dist while the value of WDir_Deg stays the same. When the value in WDir_Deg changes, I need to restart the cumulative sum.

Therefore,

index | WDir_Deg | Inst_Dist | Cumu_Dist
0     | 289      | 20        | NaN
1     | 285      | 17        | NaN
2     | 285      | 19        | NaN
3     | 287      | 19        | NaN
4     | 289      | 10        | NaN

becomes

index | WDir_Deg | Inst_Dist | Cumu_Dist
0     | 289      | 20        | 20
1     | 285      | 17        | 17
2     | 285      | 19        | 36
3     | 287      | 19        | 19
4     | 289      | 10        | 10

My non-idiomatic (extremely slow) Python code is given below. I'd really appreciate if someone can guide me on how to make the code faster and idiomatic.

prev_angle = -1
curr_cumu_dist = 0
for curr_ind in df.index:
    curr_angle = df.loc[curr_ind, 'WDir_Deg']
    if prev_angle == curr_angle:
        curr_cumu_dist += df.loc[curr_ind, 'Inst_Dist']
        df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
    else:
        prev_angle = curr_angle
        curr_cumu_dist = df.loc[curr_ind, 'Inst_Dist']
        df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
Atif
  • 345
  • 1
  • 4
  • 16

2 Answers2

2

Use helper Series with compare WDir_Deg column for not equal by ne, shift and cumsum for consecutive groups and pass it to DataFrameGroupBy.cumsum:

s = df['WDir_Deg'].ne(df['WDir_Deg'].shift()).cumsum()
df['Cumu_Dist'] = df.groupby(s)['Inst_Dist'].cumsum()
print (df)
   WDir_Deg  Inst_Dist  Cumu_Dist
0       289         20         20
1       285         17         17
2       285         19         36
3       287         19         19
4       289         10         10

Detail:

print (s)
0    1
1    2
2    2
3    3
4    4
Name: WDir_Deg, dtype: int32
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Kind of tricky. Referencing this question/answers Pandas groupby cumulative sum

I made this solution

df['Cumu_Dist'] = df.groupby('WDir_Deg').Inst_Dist.cumsum()

Which returns

   index  WDir_Deg  Inst_Dist  Cumu_Dist
0      0       285         17         17
1      1       285         19         36
2      2       287         19         19
3      3       289         20         20

This uses pandas version 0.23.4

Ian Thompson
  • 2,914
  • 2
  • 18
  • 31
  • This does not work because all `WDir_Deg` values are grouped together and the change in `WDir_Deg` is neglected. I just updated the question data as well. Basically, when the `WDir_Deg` value changes, `Cumu_Dist` should start again, not continue with the previous cumulative sum. – Atif Feb 27 '19 at 09:30