1

How to I added a 'Sum' Column to a panda groupby dataframe? I would like to do a 'Sum' on both the 'Bearish' and 'Bullish' inner column for the groupby dataframe below.

Then I would like to add two other columns:

%Bearish = Bearish/Sum*100

%Bullish = Bullish/Sum*100

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
group_df = group_df.unstack()

                    message        
sentiment           Bearish Bullish
created                            
2017-08-01 23:00:00     2.0     2.0
2017-08-02 00:00:00     1.0     3.0
2017-08-02 01:00:00     NaN     4.0
Karun
  • 561
  • 7
  • 23

1 Answers1

1

You can use concat with new DataFrame:

idx = pd.date_range('2017-08-01 23:13:00', periods=12, freq='12T')
df = pd.DataFrame({'message':[1,1,2,2,2,2,2,2,3,3,3,3],
                   'sentiment':['Bearish'] * 5 + ['Bullish'] * 7 }, index=idx)
print (df)
                     message sentiment
2017-08-01 23:13:00        1   Bearish
2017-08-01 23:25:00        1   Bearish
2017-08-01 23:37:00        2   Bearish
2017-08-01 23:49:00        2   Bearish
2017-08-02 00:01:00        2   Bearish
2017-08-02 00:13:00        2   Bullish
2017-08-02 00:25:00        2   Bullish
2017-08-02 00:37:00        2   Bullish
2017-08-02 00:49:00        3   Bullish
2017-08-02 01:01:00        3   Bullish
2017-08-02 01:13:00        3   Bullish
2017-08-02 01:25:00        3   Bullish

group_df =df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
#add ['message'] for remove Multiindex in columns
group_df = group_df['message'].unstack()

#divide by sum
#add prefix - https://stackoverflow.com/q/45453508/2901002 
df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
print (df1)
                     %Bearish   %Bullish
2017-08-01 23:00:00      80.0        NaN
2017-08-02 00:00:00      20.0  57.142857
2017-08-02 01:00:00       NaN  42.857143

df = pd.concat([group_df, df1], axis=1)
print (df)
                     Bearish  Bullish  %Bearish   %Bullish
2017-08-01 23:00:00      4.0      NaN      80.0        NaN
2017-08-02 00:00:00      1.0      4.0      20.0  57.142857
2017-08-02 01:00:00      NaN      3.0       NaN  42.857143

If need GroupBy.size:

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).size()
group_df = group_df.unstack()

df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
print (df1)
                     %Bearish   %Bullish
2017-08-01 23:00:00      80.0        NaN
2017-08-02 00:00:00      20.0  57.142857
2017-08-02 01:00:00       NaN  42.857143

df = pd.concat([group_df, df1], axis=1)
print (df)
                     Bearish  Bullish  %Bearish   %Bullish
2017-08-01 23:00:00      4.0      NaN      80.0        NaN
2017-08-02 00:00:00      1.0      4.0      20.0  57.142857
2017-08-02 01:00:00      NaN      3.0       NaN  42.857143

What is the difference between size and count in pandas?

Graham
  • 7,431
  • 18
  • 59
  • 84
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252