1

Here is my code as of now:

d = {}
for stage in ['doggo', 'floofer', 'puppo', 'pupper']:
    #d[stage] =df.groupby([stage]).agg({'retweet_count': 'sum'})
    d[stage] = df.groupby(stage)['retweet_count'].sum()
stage_retweets = pd.DataFrame.from_dict(d)

It produces this:

         doggo      floofer     puppo       pupper
None    1387471.0   1517639.0   1472697.0   1444766.0
doggo   159188.0    NaN         NaN         NaN
floofer NaN         29020.0     NaN         NaN
puppo   NaN         NaN         73962.0     NaN
pupper  NaN         NaN         NaN         101893.0

What I would really like to produce is this:

         doggo      floofer     puppo       pupper
None    1387471.0   1517639.0   1472697.0   1444766.0
stage   159188.0    29020.0     73962.0     101893.0     

Does anyone know how to accomplish this?

Chris Macaluso
  • 1,372
  • 2
  • 14
  • 33
  • Can you provide some sample data to begin with? It seems like you could perhaps map `['doggo', 'floofer', 'puppo', 'pupper']` all to `'stage'` and group by the mapped value. – ALollz Feb 16 '19 at 20:15
  • I don't know how to post data frames neatly :/ it's unreadable if I copy paste, and a mess to make even – Chris Macaluso Feb 16 '19 at 20:25
  • You can see this post: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples for how to make a good reproducible example. – ALollz Feb 16 '19 at 20:30

1 Answers1

1
d = {}
# 1 - Put your stages in a list variable
stages = ['doggo', 'floofer', 'puppo', 'pupper']

for stage in stages:
    d[stage] = df.groupby(stage)['retweet_count'].sum()
stage_retweets = pd.DataFrame.from_dict(d)
print(stage_retweets)

# 2 - Create a column conditionally to detect if the index in stages list or not
# !! important !! make shure you have only one index level otherwise stage_retweets.index.isin(stages) won't work
stage_retweets['is_stage'] = np.where(stage_retweets.index.isin(stages), 'Stage', 'None')
print(stage_retweets)

# 3 - Groupby this new column
stage_retweets = stage_retweets.groupby('is_stage').sum().reset_index()
print(stage_retweets)
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Charles R
  • 1,621
  • 1
  • 8
  • 25