I have a pandas dataframe in python consisting of the columns date_time
consisting of date and time as a datetime object, date
consisting of the date as a string, and a scalar measurement t
. I want to calculate the cumulative value of t
each day, and then reset it to zero at the beginning of the next day. Is there a simple way to do this? I am currently grouping the dataframe by df.groupby('date')
.
Asked
Active
Viewed 731 times
-6

user3607022
- 480
- 3
- 16
-
1Please provide example of your dataframe and desired result – Stepan Dec 25 '19 at 17:33
-
I did provide a complete description. – user3607022 Dec 25 '19 at 17:44
-
@user3607022 you _didn't_ provide a [mcve] – Umar.H Dec 25 '19 at 17:45
-
I provided a short and reproducible description. – user3607022 Dec 25 '19 at 17:48
-
You need to provide the data you are working with so we can reproduce your data/error and give you solution based on your sample output. – Umar.H Dec 25 '19 at 17:49
-
There is no error to reproduce. – user3607022 Dec 25 '19 at 17:50
-
but there is _data_ that needs to be reproduced, have a read of [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your question – Umar.H Dec 25 '19 at 17:52
-
It's a theoretical question, and as such there is no data available. Please avoid extended discussions in the comment section. – user3607022 Dec 25 '19 at 17:55
2 Answers
1
A cumulative sum that resets is equivalent to apply it to groups : each new group will reset the cumulative sum when it starts.
It is always easier to illustrate an answer with a good minimal reproducible example :
df = pd.DataFrame([
['20191224', '20191224 01:00', 50, 'Merry'],
['20191224', '20191224 02:30', 50, 'Christmas'],
['20191225', '20191225 02:00', 50, 'Merry'],
['20191225', '20191225 04:25', 50, 'Christmas'],
['20191225', '20191225 06:50', 50, ':)']],
columns = ['date_str', 'date_time', 'quantity', 'msg'])
To make sure to cast 'date_time' column to actual timestamps (all time formats in documentation)
df['date_time'] = pd.to_datetime(df['date_time'], format = "%Y%m%d %H:%M")
To make sure that your dates are ordered (important as you will cumsum
)
df = df.sort_values('date_time')
You can groupby date_str
as they are represent your daily groups :
df.groupby('date_str').agg({
'quantity': 'sum',
'message': lambda x: x.join(' ')
})
quantity msg
date_str
20191224 100 Merry Christmas
20191225 150 Merry Christmas :)
In your case, what you want is to transform
cumsum
:
df['daily_cum_quantity'] = df.groupby('date_str')['quantity'].transform('cumsum')
Resulting in :
date_str date_time quantity msg cum_quantity
0 20191224 2019-12-24 01:00:00 50 Merry 50
1 20191224 2019-12-24 02:30:00 50 Christmas 100
2 20191225 2019-12-25 02:00:00 50 Merry 50
3 20191225 2019-12-25 04:25:00 50 Christmas 100
4 20191225 2019-12-25 06:50:00 50 :) 150

Vincent
- 1,534
- 3
- 20
- 42
0
temp = df.groupby('date')['t'].cumsum().rename('t_cumulative')
df = df.merge(temp, on='date_time', how='outer')

user3607022
- 480
- 3
- 16