Calculate cumulative intraday measures that reset every day in pandas

Question

I have a pandas dataframe in python consisting of the columns date_time consisting of date and time as a datetime object, date consisting of the date as a string, and a scalar measurement t. I want to calculate the cumulative value of t each day, and then reset it to zero at the beginning of the next day. Is there a simple way to do this? I am currently grouping the dataframe by df.groupby('date').

You need to provide the data you are working with so we can reproduce your data/error and give you solution based on your sample output. — Umar.H, Dec 25 '19 at 17:49
but there is _data_ that needs to be reproduced, have a read of [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your question — Umar.H, Dec 25 '19 at 17:52
It's a theoretical question, and as such there is no data available. Please avoid extended discussions in the comment section. — user3607022, Dec 25 '19 at 17:55

score 1 · Answer 1 · answered Dec 27 '19 at 11:00

A cumulative sum that resets is equivalent to apply it to groups : each new group will reset the cumulative sum when it starts.

It is always easier to illustrate an answer with a good minimal reproducible example :

df = pd.DataFrame([
    ['20191224', '20191224 01:00', 50, 'Merry'], 
    ['20191224', '20191224 02:30', 50, 'Christmas'],
    ['20191225', '20191225 02:00', 50, 'Merry'],
    ['20191225', '20191225 04:25', 50, 'Christmas'],
    ['20191225', '20191225 06:50', 50, ':)']],
    columns = ['date_str', 'date_time', 'quantity', 'msg'])

To make sure to cast 'date_time' column to actual timestamps (all time formats in documentation)

df['date_time'] = pd.to_datetime(df['date_time'], format = "%Y%m%d %H:%M")

To make sure that your dates are ordered (important as you will cumsum)

df = df.sort_values('date_time')

You can groupby date_str as they are represent your daily groups :

df.groupby('date_str').agg({
    'quantity': 'sum',
    'message': lambda x: x.join(' ')
})

            quantity                    msg
date_str        
20191224         100        Merry Christmas
20191225         150     Merry Christmas :)

In your case, what you want is to transform cumsum:

df['daily_cum_quantity'] = df.groupby('date_str')['quantity'].transform('cumsum')

Resulting in :

    date_str    date_time             quantity   msg        cum_quantity
0   20191224    2019-12-24 01:00:00   50         Merry      50
1   20191224    2019-12-24 02:30:00   50         Christmas  100
2   20191225    2019-12-25 02:00:00   50         Merry      50
3   20191225    2019-12-25 04:25:00   50         Christmas  100
4   20191225    2019-12-25 06:50:00   50         :)         150

score 0 · Accepted Answer · answered Dec 28 '19 at 20:16

0

temp = df.groupby('date')['t'].cumsum().rename('t_cumulative')
df = df.merge(temp, on='date_time', how='outer')

answered Dec 28 '19 at 20:16

user3607022

480
3
16

Calculate cumulative intraday measures that reset every day in pandas

2 Answers2