Let's say that I start with this dataframe
d = {'price': [10, 12, 8, 12, 14, 18, 10, 20],
'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
df = pd.DataFrame(d)
df['a_date'] = pd.date_range('01/01/2018',
periods=8,
freq='W')
df
price volume a_date
0 10 50 2018-01-07
1 12 60 2018-01-14
2 8 40 2018-01-21
3 12 100 2018-01-28
4 14 50 2018-02-04
5 18 100 2018-02-11
6 10 40 2018-02-18
7 20 50 2018-02-25
Now, I would like to resample/groupby in such a way that the data is aggregated on time intervals of roughly 10 days, but with pre-defined start and end dates, which fall on the 10th, 20th and last day of the month, such as:
2018-01-01 to 2018-01-10
2018-01-11 to 2018-01-20
2018-01-21 to 2018-01-31
2018-02-01 to 2018-02-10
2018-02-11 to 2018-02-20
2018-02-21 to 2018-02-28
and the results would be, in case of summing across the intervals:
price volume
a_date
2018-01-10 10 50
2018-01-20 12 60
2018-01-31 20 140
2018-02-10 14 50
2018-02-20 28 140
2018-02-28 20 50
The closest that I can get to this is doing df.resample('10D', on='a_date').sum()
but clearly I need something more customized as interval.
I would be happy with just passing an array of intervals, but I don't think that's possible.
I've tried, as an experiment:
td = pd.to_datetime('2018-01-10') - pd.to_datetime('2018-01-01')
df.resample(td, on='a_date').sum()
but the pandas.Timedelta
does not keep information on the specific dates.
EDIT:
A different dataframe to test the first day of the month:
d = {'price': np.arange(20)+1,
'volume': np.arange(20)+5}
df = pd.DataFrame(d)
df['a_date'] = pd.date_range('01/01/2018',
periods=20,
freq='D')
applying the accepted answer gives (the first day is not taken into account):
a_date price volume
0 2018-01-10 54 90
1 2018-01-20 155 195
compare with (for the first interval 2018-01-01 to 2018-01-10):
df.iloc[:10].sum()
price 55
volume 95
dtype: int64