How to calculate count and percentage in groupby in Python

Question

I have following output after grouping by

Publisher.groupby('Category')['Title'].count()
Category
Coding          5
Hacking         7
Java            1
JavaScript      5
LEGO           43
Linux           7
Networking      5
Others        123
Python          8
R               2
Ruby            4
Scripting       4 
Statistics      2
Web             3

In the above output I want the percentage also i.e for the first row 5*100/219 and so on. I am doing following

 Publisher.groupby('Category')['Title'].agg({'Count':'count','Percentage':lambda x:x/x.sum()})

But it gives me an error. Please help

related: http://stackoverflow.com/questions/36609176/groupby-pandas-calculate-percentage and http://stackoverflow.com/questions/23627782/pandas-groupby-size-and-percentages and http://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby — EdChum, Oct 06 '16 at 10:08
Then you need to post raw data, your code and the errors in order for us to help you — EdChum, Oct 06 '16 at 10:19
Possible duplicate of [Pandas: .groupby().size() and percentages](https://stackoverflow.com/questions/23627782/pandas-groupby-size-and-percentages) — iff_or, Sep 16 '17 at 20:33

score 18 · Accepted Answer · answered Oct 06 '16 at 10:14

18

I think you can use:

P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title']  / P['Title'].sum()

Sample:

Publisher = pd.DataFrame({'Category':['a','a','s'],
                   'Title':[4,5,6]})

print (Publisher)
  Category  Title
0        a      4
1        a      5
2        s      6

P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title']  / P['Title'].sum()
print (P)
  Category  Title  Percentage
0        a      2   66.666667
1        s      1   33.333333

answered Oct 06 '16 at 10:14

jezrael

822,522
95
1,334
1,252

Great. It does work. I was wondering if we can do it in groupby and then apply `count and percentage` functions in agg ? – Neil Oct 06 '16 at 10:28
Hmmm, maybe check links by `Edchum`, but I think it is problematic, because you return `Series`, so error `Exception: Must produce aggregated value`. I am not sure. – jezrael Oct 06 '16 at 10:35
not sure why, but this doesnt work for me. If I do a `data.groupby('contract type')['count'].count()` then I get a wierd little dataset out where I cannot reference the columns. when I try to add the `summ['perc'] = summ['count']/summ['count'].sum()*100` then I get an error to say "column 'count' does not exist" – GenDemo May 07 '23 at 23:45
1

@GenDemo - becaus missing `.reset_index()` ;) – jezrael May 09 '23 at 05:10

score 0 · Answer 2 · answered Apr 12 '21 at 13:15

df = pd.DataFrame({'Category':['a','a','s'],
                   'Title':[4,5,6]})

df=df.groupby('Category')['Title'].count().rename("percentage").transform(lambda x: x/x.sum())

df.reset_index()

#output in dataframe type

    Category    percentage
0   a   0.666667
1   s   0.333333

#please let me know if it doesn't solve your current problem

How to calculate count and percentage in groupby in Python

2 Answers2

Linked