I have an example dataframe like this:
import pandas as pd
df = pd.DataFrame({"id": [0]*5 + [1]*5,
"time": ['2015-01-01', '2015-01-03', '2015-01-04', '2015-01-08', '2015-01-10', '2015-02-02', '2015-02-04', '2015-02-06', '2015-02-11', '2015-02-13'],
'hit': [0,3,8,2,5, 6,12,0,7,3]})
df.time = df.time.astype('datetime64[ns]')
df = df[['id', 'time', 'hit']]
df
Will output:
id time hit
0 0 2015-01-01 0
1 0 2015-01-03 3
2 0 2015-01-04 8
3 0 2015-01-08 2
4 0 2015-01-10 5
5 1 2015-02-02 6
6 1 2015-02-04 12
7 1 2015-02-06 0
8 1 2015-02-11 7
9 1 2015-02-13 3
Then I performed a groupby
with time (per day):
df.groupby(['id', pd.Grouper(key='time', freq='1D')]).hit.sum().to_frame()
Resulted in:
hit
id time
0 2015-01-01 0
2015-01-03 3
2015-01-04 8
2015-01-08 2
2015-01-10 5
1 2015-02-02 6
2015-02-04 12
2015-02-06 0
2015-02-11 7
2015-02-13 3
However, I want to retain the daily hit even though the value = 0, and calculate the daily hit since the first day, per each id. My desired output:
hit day_since
id time
0 2015-01-01 0 1
2015-01-02 0 2
2015-01-03 3 3
2015-01-04 8 4
2015-01-05 0 5
2015-01-06 0 6
2015-01-07 0 7
1 2015-02-02 6 1
2015-02-03 0 2
2015-02-04 12 3
2015-02-05 0 4
2015-02-06 0 5
2015-02-07 0 6
2015-02-08 0 7
A cumcount
does not work because it is numbering each item by group. But in my case, I wish to calculate the sequential date difference per group.
Does anyone have any ideas?