Take the following dataframe:
import pandas as pd
from random import choice, randint
mos = ['2019-01-01', '2019-02-01', '2019-03-01', '2019-04-01', '2019-05-01'
, '2019-01-01', '2019-02-01', '2019-03-01', '2019-04-01']
df = pd.DataFrame({'month': mos
, 'group': [choice(['1', '2']) for _ in range(len(mos))]
, 'revenue': [randint(50, 150) for _ in range(len(mos))]})
Which will look something like this:
idx month group revenue
0 2019-01-01 1 111
1 2019-02-01 2 140
2 2019-03-01 1 78
3 2019-04-01 2 95
...
I would like to create a new, aggregated dataframe which sums revenue by month in separate columns for each group, which would look something like this:
idx month group_1_revenue group_2_revenue
0 2019-01-01 50 108
1 2019-02-01 230 0
2 2019-03-01 193 0
3 2019-04-01 91 138
4 2019-05-01 62 0
The best I've come up with to achieve this is:
dfs_li = []
for grp in df['group'].unique():
sub_df = df.loc[df['group'] == grp]
sub_df = sub_df.groupby('month').sum().reset_index()
sub_df.columns = ['month', f'group_{grp}_revenue']
dfs_li.append(sub_df)
agg_df = dfs_li[0].merge(dfs_li[1], on='month', how='outer').fillna(0)
I am assuming there is a better way to achieve my objective here, but I haven't found it. The above feels like it is likely inefficient, and would not generalize to k groups (which is not a requirement, but would be ideal).
Is there a better way to do this?