1

I want to calculate the relative percentages of a sum of a particular column of a dataframe after I have grouped by a different column. The format of the data frame would be for example:

df = pd.DataFrame({'Company': ['Company1', 'Company2', 'Company3', 'Company1', 'Company3'],
                   'STC Watts': [10.10, 9.6, 13.3, 5.6, 10.12],
                   ...other columns
                     )

At this point I need to group by the "Company" columns and get the total sum of "STC Watts" column for each company, but I would also like to have a relative percentage of that company "STC Watts" sum out of the total for the entire dataset. I tried using a couple solutions from this question but could not get any of them to work

Ajeet Verma
  • 2,938
  • 3
  • 13
  • 24

2 Answers2

0

Try this:

import pandas as pd
df = pd.DataFrame({'Company': ['Company1', 'Company2', 'Company3', 'Company1', 'Company3'],
                   'STC Watts': [10.10, 9.6, 13.3, 5.6, 10.12]})
new_df = df.groupby('Company').sum()
new_df['percent'] = ((new_df['STC Watts'] / new_df['STC Watts'].sum()) * 100).round(2)
new_df

The output would be:

          STC Watts  percent
Company                     
Company1      15.70    32.22
Company2       9.60    19.70
Company3      23.42    48.07
gtomer
  • 5,643
  • 1
  • 10
  • 21
0

You can post-process the groupby.sum adding a new column with the division by the total (optional multiplied by 100 for percent, and eventually rounded):

out = (
 df.groupby('Company', as_index=False)['STC Watts'].sum()
   .assign(percent=lambda d: d['STC Watts'].div(d['STC Watts'].sum())
                                           .mul(100).round(2)
           )
)

Output:

    Company  STC Watts  percent
0  Company1      15.70    32.22
1  Company2       9.60    19.70
2  Company3      23.42    48.07
mozway
  • 194,879
  • 13
  • 39
  • 75