I have a dataframe which is basically several timeseries stacked on top of one another. Each time series has a unique label (group) and they have different date ranges.
date = pd.to_datetime(pd.Series(['2010-01-01', '2010-01-02', '2010-01-03',
'2010-01-06', '2010-01-01', '2010-01-03']))
group = [1,1,1,1, 2, 2]
value = [1,2,3,4,5,6]
df = pd.DataFrame({'date':date, 'group':group, 'value':value})
df
date group value
0 2010-01-01 1 1
1 2010-01-02 1 2
2 2010-01-03 1 3
3 2010-01-06 1 4
4 2010-01-01 2 5
5 2010-01-03 2 6
I would like to resample the data so that there is an entry for every single combination of date and group (padding values to NaN if there was no observation that day or it's outside the date range). Example output would be:
date group value
2010-01-01 1 1
2010-01-02 1 2
2010-01-03 1 3
2010-01-04 1 NaN
2010-01-05 1 NaN
2010-01-06 1 4
2010-01-01 2 5
2010-01-02 2 NaN
2010-01-03 2 6
2010-01-04 2 NaN
2010-01-05 2 NaN
2010-01-06 2 NaN
I have a solution which works but I suspect there are better approaches. My solution is to first pivot the data then unstack, groupby and resample. Basically all that's really needed is to do a groupby and resample but specifying the max and min ranges of the resampling with the max and min values of the whole date column but I can't see anyway to do that.
df = (df.pivot(index='dates', columns='groups', values='values')
.unstack()
.reset_index()
.set_index('dates')
.groupby('groups').resample('D').asfreq()
.drop('groups', axis=1)
.reset_index()
.rename(columns={0:'values'}))[['dates','groups', 'values']]