-1

I am trying to do multiple statistics per group. I can do count of each group but I can't figure out how to get the percentage of each group.

Here is what I have:

In my example, I forced the 881 for all rows to calculate the percent values, but I would like to replace 881 with something like count of each final_stage and calculate the percent of each final_stage.

Community
  • 1
  • 1
user9532692
  • 584
  • 7
  • 28
  • 1
    please post a sample df and expected output df as text along with explaination, images cant be copied. – anky Apr 14 '19 at 07:57
  • From [ask]: "_DO NOT post images of code, data, error messages, etc. - copy or type the text into the question. Please reserve the use of images for diagrams or demonstrating rendering bugs, things that are impossible to describe accurately via text._" – user2314737 Apr 14 '19 at 08:42

1 Answers1

1

I believe you need specify column after groupby and pass tuples with new columns names with aggregate functions:

df.groupby('final_stage')['d1'].agg([('ctn','size'), ('percent', lambda x: len(x)/ len(df))])

Or:

df1 = df.groupby('final_stage')['d1'].size().reset_index(name='ctn')
df1['percent'] =  df1['ctn'] / len(df)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks so much for your prompt response! I am struggling with another groupby statement as shown in the following [link](https://stackoverflow.com/questions/55663359/python-summarizing-aggregating-groups-and-sub-groups-in-dataframe/55663833?noredirect=1#comment98022929_55663833). I greatly appreciate your help :) – user9532692 Apr 14 '19 at 08:04
  • @user9532692 - added solution – jezrael Apr 14 '19 at 08:52