Python how to groupby into colsolidated dataframe?

Question

Here is my code as of now:

d = {}
for stage in ['doggo', 'floofer', 'puppo', 'pupper']:
    #d[stage] =df.groupby([stage]).agg({'retweet_count': 'sum'})
    d[stage] = df.groupby(stage)['retweet_count'].sum()
stage_retweets = pd.DataFrame.from_dict(d)

It produces this:

         doggo      floofer     puppo       pupper
None    1387471.0   1517639.0   1472697.0   1444766.0
doggo   159188.0    NaN         NaN         NaN
floofer NaN         29020.0     NaN         NaN
puppo   NaN         NaN         73962.0     NaN
pupper  NaN         NaN         NaN         101893.0

What I would really like to produce is this:

         doggo      floofer     puppo       pupper
None    1387471.0   1517639.0   1472697.0   1444766.0
stage   159188.0    29020.0     73962.0     101893.0

Does anyone know how to accomplish this?

Can you provide some sample data to begin with? It seems like you could perhaps map `['doggo', 'floofer', 'puppo', 'pupper']` all to `'stage'` and group by the mapped value. — ALollz, Feb 16 '19 at 20:15
I don't know how to post data frames neatly :/ it's unreadable if I copy paste, and a mess to make even — Chris Macaluso, Feb 16 '19 at 20:25
You can see this post: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples for how to make a good reproducible example. — ALollz, Feb 16 '19 at 20:30

score 1 · Accepted Answer · edited Apr 11 '21 at 16:03

d = {}
# 1 - Put your stages in a list variable
stages = ['doggo', 'floofer', 'puppo', 'pupper']

for stage in stages:
    d[stage] = df.groupby(stage)['retweet_count'].sum()
stage_retweets = pd.DataFrame.from_dict(d)
print(stage_retweets)

# 2 - Create a column conditionally to detect if the index in stages list or not
# !! important !! make shure you have only one index level otherwise stage_retweets.index.isin(stages) won't work
stage_retweets['is_stage'] = np.where(stage_retweets.index.isin(stages), 'Stage', 'None')
print(stage_retweets)

# 3 - Groupby this new column
stage_retweets = stage_retweets.groupby('is_stage').sum().reset_index()
print(stage_retweets)

Python how to groupby into colsolidated dataframe?

1 Answers1