Cumsum from DateTime that reset at specific times

Question

I need to add a column containing the count of rows since Time was 14:30. At each 14:30, the count starts from zero again. The times in this example can be different so I canont use a fixed variable to compare against.

I looked at: Cumsum reset at NaN and cumsum reset at NaN_2 but I just cannot figure it out. My values are Time based not NaN based. Appreciate any help on this.

Date         Time       Open    High    Low     Last
28/05/2018   14:30:00   1.16167 1.16252 1.1613  1.16166
28/05/2018   15:00:00   1.16166 1.16287 1.16159 1.16276
28/05/2018   15:30:00   1.16277 1.16293 1.16177 1.16212
28/05/2018   16:00:00   1.16213 1.16318 1.16198 1.16262
28/05/2018   16:30:00   1.16262 1.16298 1.16258 1.16284
28/05/2018   17:00:00   1.16285 1.16329 1.16264 1.16265
28/05/2018   17:30:00   1.16266 1.163   1.16243 1.16289
28/05/2018   18:00:00   1.16288 1.1629  1.16228 1.16269
28/05/2018   18:30:00   1.16269 1.16278 1.16264 1.16274
28/05/2018   19:00:00   1.16275 1.16277 1.1627  1.16275
28/05/2018   19:30:00   1.16276 1.16284 1.1627  1.1628
28/05/2018   20:00:00   1.16279 1.16288 1.16264 1.16278
28/05/2018   20:30:00   1.16278 1.16289 1.1626  1.16265
28/05/2018   21:00:00   1.16267 1.1627  1.16251 1.16262
29/05/2018   14:30:00   1.15793 1.15827 1.15714 1.15786
29/05/2018   15:00:00   1.15785 1.159   1.15741 1.15814
29/05/2018   15:30:00   1.15813 1.15813 1.15601 1.15647
29/05/2018   16:00:00   1.15647 1.15658 1.15451 1.15539
29/05/2018   16:30:00   1.15539 1.15601 1.15418 1.1551
29/05/2018   17:00:00   1.15508 1.15599 1.15463 1.15527
29/05/2018   17:30:00   1.15528 1.15587 1.15442 1.15465
29/05/2018   18:00:00   1.15465 1.15469 1.15196 1.15261
29/05/2018   18:30:00   1.15261 1.15441 1.15261 1.15349
29/05/2018   19:00:00   1.15348 1.15399 1.15262 1.15399
29/05/2018   19:30:00   1.154   1.15412 1.15239 1.15322
29/05/2018   20:00:00   1.15322 1.15373 1.15262 1.15367
29/05/2018   20:30:00   1.15367 1.15419 1.15351 1.15367
29/05/2018   21:00:00   1.15366 1.15438 1.15352 1.15354
29/05/2018   21:30:00   1.15355 1.15355 1.15354 1.15354
30/05/2018   14:30:00   1.16235 1.16323 1.16133 1.16161
30/05/2018   15:00:00   1.16162 1.16193 1.1602  1.16059

Quang Hoang · Accepted Answer · 2021-02-22T15:32:23.023

Try groupby().cumcount() on the cumsum:

# blocks starting with `14:30:00`
# print to see the blocks
blocks = df.Time.eq('14:30:00').cumsum()

# enumerate the rows within each block with `groupby`
df['count_1430'] = df.groupby(blocks).cumcount()

Output:

          Date      Time     Open     High      Low     Last  count_1430
0   28/05/2018  14:30:00  1.16167  1.16252  1.16130  1.16166           0
1   28/05/2018  15:00:00  1.16166  1.16287  1.16159  1.16276           1
2   28/05/2018  15:30:00  1.16277  1.16293  1.16177  1.16212           2
3   28/05/2018  16:00:00  1.16213  1.16318  1.16198  1.16262           3
4   28/05/2018  16:30:00  1.16262  1.16298  1.16258  1.16284           4
5   28/05/2018  17:00:00  1.16285  1.16329  1.16264  1.16265           5
6   28/05/2018  17:30:00  1.16266  1.16300  1.16243  1.16289           6
7   28/05/2018  18:00:00  1.16288  1.16290  1.16228  1.16269           7
8   28/05/2018  18:30:00  1.16269  1.16278  1.16264  1.16274           8
9   28/05/2018  19:00:00  1.16275  1.16277  1.16270  1.16275           9
10  28/05/2018  19:30:00  1.16276  1.16284  1.16270  1.16280          10
11  28/05/2018  20:00:00  1.16279  1.16288  1.16264  1.16278          11
12  28/05/2018  20:30:00  1.16278  1.16289  1.16260  1.16265          12
13  28/05/2018  21:00:00  1.16267  1.16270  1.16251  1.16262          13
14  29/05/2018  14:30:00  1.15793  1.15827  1.15714  1.15786           0
15  29/05/2018  15:00:00  1.15785  1.15900  1.15741  1.15814           1
16  29/05/2018  15:30:00  1.15813  1.15813  1.15601  1.15647           2
17  29/05/2018  16:00:00  1.15647  1.15658  1.15451  1.15539           3
18  29/05/2018  16:30:00  1.15539  1.15601  1.15418  1.15510           4
19  29/05/2018  17:00:00  1.15508  1.15599  1.15463  1.15527           5
20  29/05/2018  17:30:00  1.15528  1.15587  1.15442  1.15465           6
21  29/05/2018  18:00:00  1.15465  1.15469  1.15196  1.15261           7
22  29/05/2018  18:30:00  1.15261  1.15441  1.15261  1.15349           8
23  29/05/2018  19:00:00  1.15348  1.15399  1.15262  1.15399           9
24  29/05/2018  19:30:00  1.15400  1.15412  1.15239  1.15322          10
25  29/05/2018  20:00:00  1.15322  1.15373  1.15262  1.15367          11
26  29/05/2018  20:30:00  1.15367  1.15419  1.15351  1.15367          12
27  29/05/2018  21:00:00  1.15366  1.15438  1.15352  1.15354          13
28  29/05/2018  21:30:00  1.15355  1.15355  1.15354  1.15354          14
29  30/05/2018  14:30:00  1.16235  1.16323  1.16133  1.16161           0
30  30/05/2018  15:00:00  1.16162  1.16193  1.16020  1.16059           1

Thank you so very much. Do you mind just explaining what happened there? 1st you grouped by 14:30...then what does cumsum()).cumcount() do? — Grantx, Feb 22 '21 at 15:30

score 1 · Answer 2 · answered Feb 22 '21 at 15:21

These kind of operations are pretty difficult to do elegantly in my experience. I think your best option is to use the iterrows (doc) function. That will let you iterate over the rows of the dataframe in a for loop, so you can execute your logic manually.

Something like this:

# I'm assuming there's some sort of base case, so start there
count = -1 
for row in df.iterrows():
    if row['Time'] == datetime.datetime(hour=14, minutes=30):
        count = 0
    elif count != -1:
        count += 1
    # ... do whatever you need to with the count

It's not the prettiest solution, and the doc specifically warns against modifying a dataframe you're iterating over, but since you're not adding/removing any rows you should be okay and I don't know how else to do it.

Thanks for answering. I was hoping to avoid looping and Quang Hoang came up with a pretty elegant solution. — Grantx, Feb 22 '21 at 15:48

Cumsum from DateTime that reset at specific times

2 Answers2