I'm having an issue working out a rolling count of transactions applicable to each individual buyer in this dataset structured as follows:
userID itemID transaction_ts
3229 4493320 2016-01-02 14:55:00
3229 4492492 2016-01-02 14:57:02
3229 4496756 2016-01-04 09:01:18
3229 4493673 2016-01-04 09:11:10
3229 4497531 2016-01-04 11:05:25
3229 4495006 2016-01-05 07:25:11
4330 4500695 2016-01-02 09:17:21
4330 4500656 2016-01-03 09:19:28
4330 4503087 2016-01-04 07:42:15
4330 4501846 2016-01-04 08:55:24
4330 4504105 2016-01-04 09:59:35
Ideally, it would look like the below for a rolling transaction count window of e.g. 24 hours:
userID itemID transaction_ts rolling_count
3229 4493320 2016-01-02 14:55:00 1
3229 4492492 2016-01-02 14:57:02 2
3229 4496756 2016-01-04 09:01:18 1
3229 4493673 2016-01-04 09:11:10 2
3229 4497531 2016-01-04 11:05:25 3
3229 4495006 2016-01-05 07:25:11 4
4330 4500695 2016-01-02 09:17:21 1
4330 4500656 2016-01-03 09:19:28 1
4330 4503087 2016-01-04 07:42:15 2
4330 4501846 2016-01-04 08:55:24 3
4330 4504105 2016-01-04 09:59:35 3
There is an excellent answer to a similar problem here: pandas rolling sum of last five minutes
However, this answer depends solely on the timestamp field, unlike the above where the rolling count must reset to 1 upon encountering a transaction from a different user to that of the row above. It is possible to find a solution via slicing but given the size of this dataset (potentially 1m+ rows) that is not feasible.
Crucially, the window should reflect the 24 hour period prior to the transactional_ts of the respective row, hence why I think a custom df.apply or rolling_window method is appropriate, I just can't figure out how to make that conditional on the userID.