12

I have following output after grouping by

Publisher.groupby('Category')['Title'].count()
Category
Coding          5
Hacking         7
Java            1
JavaScript      5
LEGO           43
Linux           7
Networking      5
Others        123
Python          8
R               2
Ruby            4
Scripting       4 
Statistics      2
Web             3

In the above output I want the percentage also i.e for the first row 5*100/219 and so on. I am doing following

 Publisher.groupby('Category')['Title'].agg({'Count':'count','Percentage':lambda x:x/x.sum()})

But it gives me an error. Please help

EdChum
  • 376,765
  • 198
  • 813
  • 562
Neil
  • 7,937
  • 22
  • 87
  • 145
  • 1
    related: http://stackoverflow.com/questions/36609176/groupby-pandas-calculate-percentage and http://stackoverflow.com/questions/23627782/pandas-groupby-size-and-percentages and http://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby – EdChum Oct 06 '16 at 10:08
  • @EdChum It does not seem to work in my case. – Neil Oct 06 '16 at 10:17
  • Then you need to post raw data, your code and the errors in order for us to help you – EdChum Oct 06 '16 at 10:19
  • Possible duplicate of [Pandas: .groupby().size() and percentages](https://stackoverflow.com/questions/23627782/pandas-groupby-size-and-percentages) – iff_or Sep 16 '17 at 20:33

2 Answers2

18

I think you can use:

P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title']  / P['Title'].sum()

Sample:

Publisher = pd.DataFrame({'Category':['a','a','s'],
                   'Title':[4,5,6]})

print (Publisher)
  Category  Title
0        a      4
1        a      5
2        s      6

P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title']  / P['Title'].sum()
print (P)
  Category  Title  Percentage
0        a      2   66.666667
1        s      1   33.333333
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Great. It does work. I was wondering if we can do it in groupby and then apply `count and percentage` functions in agg ? – Neil Oct 06 '16 at 10:28
  • Hmmm, maybe check links by `Edchum`, but I think it is problematic, because you return `Series`, so error `Exception: Must produce aggregated value`. I am not sure. – jezrael Oct 06 '16 at 10:35
  • not sure why, but this doesnt work for me. If I do a `data.groupby('contract type')['count'].count()` then I get a wierd little dataset out where I cannot reference the columns. when I try to add the `summ['perc'] = summ['count']/summ['count'].sum()*100` then I get an error to say "column 'count' does not exist" – GenDemo May 07 '23 at 23:45
  • 1
    @GenDemo - becaus missing `.reset_index()` ;) – jezrael May 09 '23 at 05:10
0
df = pd.DataFrame({'Category':['a','a','s'],
                   'Title':[4,5,6]})

df=df.groupby('Category')['Title'].count().rename("percentage").transform(lambda x: x/x.sum())

df.reset_index()

#output in dataframe type

    Category    percentage
0   a   0.666667
1   s   0.333333

#please let me know if it doesn't solve your current problem
Vishvas Chauhan
  • 240
  • 2
  • 10