For the following df
, I want to calculate the cumulative sum of the column Inst_Dist
and save as Cumu_Dist
while the value of WDir_Deg
stays the same. When the value in WDir_Deg
changes, I need to restart the cumulative sum.
Therefore,
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | NaN
1 | 285 | 17 | NaN
2 | 285 | 19 | NaN
3 | 287 | 19 | NaN
4 | 289 | 10 | NaN
becomes
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | 20
1 | 285 | 17 | 17
2 | 285 | 19 | 36
3 | 287 | 19 | 19
4 | 289 | 10 | 10
My non-idiomatic (extremely slow) Python code is given below. I'd really appreciate if someone can guide me on how to make the code faster and idiomatic.
prev_angle = -1
curr_cumu_dist = 0
for curr_ind in df.index:
curr_angle = df.loc[curr_ind, 'WDir_Deg']
if prev_angle == curr_angle:
curr_cumu_dist += df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
else:
prev_angle = curr_angle
curr_cumu_dist = df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist