0

Take the following dataframe:

import pandas as pd
from random import choice, randint

mos = ['2019-01-01', '2019-02-01', '2019-03-01', '2019-04-01', '2019-05-01'
    , '2019-01-01', '2019-02-01', '2019-03-01', '2019-04-01']

df = pd.DataFrame({'month': mos
                      , 'group': [choice(['1', '2']) for _ in range(len(mos))]
                      , 'revenue': [randint(50, 150) for _ in range(len(mos))]})

Which will look something like this:

idx month          group  revenue
0   2019-01-01     1      111
1   2019-02-01     2      140
2   2019-03-01     1       78
3   2019-04-01     2       95
...

I would like to create a new, aggregated dataframe which sums revenue by month in separate columns for each group, which would look something like this:

idx  month           group_1_revenue  group_2_revenue
0    2019-01-01       50              108
1    2019-02-01      230                0
2    2019-03-01      193                0
3    2019-04-01       91              138
4    2019-05-01       62                0

The best I've come up with to achieve this is:

dfs_li = []
for grp in df['group'].unique():
    sub_df = df.loc[df['group'] == grp]
    sub_df = sub_df.groupby('month').sum().reset_index()
    sub_df.columns = ['month', f'group_{grp}_revenue']
    dfs_li.append(sub_df)

agg_df = dfs_li[0].merge(dfs_li[1], on='month', how='outer').fillna(0)

I am assuming there is a better way to achieve my objective here, but I haven't found it. The above feels like it is likely inefficient, and would not generalize to k groups (which is not a requirement, but would be ideal).

Is there a better way to do this?

xibalba1
  • 538
  • 8
  • 16

0 Answers0