Think of the following table
np.random.seed(42)
ix = pd.date_range('2017-01-01', '2017-01-15', freq='60s')
df = pd.DataFrame(
{
'val': np.random.random(size=ix.shape[0]),
'active': np.random.choice([0,1], size=ix.shape[0])
},
index=ix
)
df.sample(10)
yielding:
active val
2017-01-02 06:05:00 1 0.774654
2017-01-04 08:15:00 1 0.934796
2017-01-13 01:02:00 0 0.792351...
My objective is to compute:
- sum per day
- sum of actives per day
Sum per day This one is straightforwards:
gb = df.groupby(pd.to_datetime(df.index.date))
overall_sum_per_day = gb['val'].sum().rename('overall')
Sum per active day This is a little trickier (see this).
active_sum_per_day = gb.agg(lambda x: x[x.active==1]['val'].sum())['val'].rename('active')
My question How can I combine the two. Using concat
:
pd.concat([overall_sum_per_day, active_sum_per_day], axis=1)
I can achieve my objective. But I fail to do achieve it in one go and apply the two aggregations at once. Is it possible? See this comment.