0

I have a dataframe with a column that consists of datetime values, one that consists of speed values, and one that consists of timedelta values between the rows.

I would want to get the cumulative sum of timedeltas whenever the speed is below 2 knots. When the speed rises above 2 knots, I would like this cumulative sum to reset to 0, and then to start summing at the next instance of speed observations below 2 knots.

I have started by flagging all observations of speed values < 2. I only manage to get the cumulative sum for all of the observations with speed < 2, but not a cumulative sum separated for each instance.

The dataframe looks like this, and cum_sum is the desired output:

datetime            speed      timedelta     cum_sum     flag
1-1-2019 19:30:00    0.5           0            0          1
1-1-2019 19:32:00    0.7           2            2          1
1-1-2019 19:34:00    0.1           2            4          1
1-1-2019 19:36:00    5.0           2            0          0
1-1-2019 19:38:00    25.0          2            0          0
1-1-2019 19:42:00    0.1           4            4          1
1-1-2019 19:49:00    0.1           7            11         1
Georgy
  • 12,464
  • 7
  • 65
  • 73
Kaangy
  • 15
  • 4

2 Answers2

1

You can use the method from "How to groupby consecutive values in pandas DataFrame" to get the groups where flag is either 1 or 0, and then you will just need to apply the cumsum on the timedelta column, and set those values where flag == 0 to 0:

gb = df.groupby((df['flag'] != df['flag'].shift()).cumsum())
df['cum_sum'] = gb['timedelta'].cumsum()
df.loc[df['flag'] == 0, 'cum_sum'] = 0
print(df)

will give

            datetime    speed   timedelta   flag   cum_sum
0   1-1-2019 19:30:00   0.5     0           1      0
1   1-1-2019 19:32:00   0.7     2           1      2
2   1-1-2019 19:34:00   0.1     2           1      4
3   1-1-2019 19:36:00   5.0     2           0      0
4   1-1-2019 19:38:00   25.0    2           0      0
5   1-1-2019 19:42:00   0.1     4           1      4
6   1-1-2019 19:49:00   0.1     7           1      11
Georgy
  • 12,464
  • 7
  • 65
  • 73
0

Note: Uses global variable

c = 0
def fun(x):
    global c    
    if x['speed'] > 2.0:
        c = 0
    else:
        c = x['timedelta']+c
    return c

df = pd.DataFrame( {'datetime': ['1-1-2019 19:30:00']*7,
    'speed': [0.5,.7,0.1,5.0,25.0,0.1,0.1], 'timedelta': [0,2,2,2,2,4,7]})

df['cum_sum']=df.apply(fun, axis=1)
            datetime    speed   timedelta   cum_sum
0   1-1-2019 19:30:00   0.5     0           0
1   1-1-2019 19:30:00   0.7     2           2
2   1-1-2019 19:30:00   0.1     2           4
3   1-1-2019 19:30:00   5.0     2           0
4   1-1-2019 19:30:00   25.0    2           0
5   1-1-2019 19:30:00   0.1     4           4
6   1-1-2019 19:30:00   0.1     7           11
mujjiga
  • 16,186
  • 2
  • 33
  • 51