pandas divide value by aggregated sum based on indicator

Question

Before this is marked as a duplicate, I've looked at the following: question1 question2 source3

For each farmer, I'm trying to calculate two things: 1) the % of ripe fruit that is fruit x: %(ripe fruit x)/(total ripe fruit) 2) the % of ripe fruit of fruit x: %(ripe fruit x)/(total fruit x)

based on the ripe fruit indicator (1 for ripe and 0 for not ripe).

Input:

df = pd.DataFrame({'Farmer': ['Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Tims','Tims','Tims','Tims'],
                 'Fruit':['Apple','Apple','Apple','Grape','Grape','Grape','Grape','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry'],
                 'Type': ['Red','Yellow','Green','Red seedless','Red with seeds','Green','Purple','Montmorency','Morello','Bing','Rainer','Montmorency','Morello','Bing','Rainer'],
                 'Number':[2,6,2,1,1,6,2,3,1,3,3,3,1,3,3],
                 'Ripe':[1,1,0,1,0,1,1,0,0,0,1,0,0,0,1]})
df

    Farmer  Fruit   Number  Ripe    Type
0   Sallys  Apple   2        1      Red
1   Sallys  Apple   6        1      Yellow
2   Sallys  Apple   2        0      Green
3   Sallys  Grape   1        1      Red seedless
4   Sallys  Grape   1        0      Red with seeds
5   Sallys  Grape   6        1      Green
6   Sallys  Grape   2        1      Purple
7   Sallys  Cherry  3        0      Montmorency
8   Sallys  Cherry  1        0      Morello
9   Sallys  Cherry  3        0      Bing
10  Sallys  Cherry  3        1      Rainer
11  Tims    Cherry  3        0      Montmorency
12  Tims    Cherry  1        0      Morello
13  Tims    Cherry  3        0      Bing
14  Tims    Cherry  3        1      Rainer

Desired Output:

    Farmer  Fruit   %(ripe fruit x)/(total ripe fruit)  %(ripe fruit x)/(total fruit x)
0   Sallys  Apple   40                                  80
1   Sallys  Grape   45                                  90
2   Sallys  Cherry  15                                  30
3   Tims    Cherry  100                                 30

Could you please explain the third column? To my understanding the total ripe fruit must be 23, and going by the numbers you have posted, it seems to be otherwise. — Ankur Sinha, Oct 17 '18 at 06:59
@jezrael it looks like your answer addresses the question appropriately, but for others, (2+6)/(2+6+1+6+2+3) = 40% and (2+6)/(2+6+2) = 80%. — Mariah Akinbi, Oct 17 '18 at 13:18
@trollster absolutely! total ripe fruit for each farmer* so for Sally it is 20 and for Tim the total ripe fruit =3 — Mariah Akinbi, Oct 17 '18 at 13:20

score 2 · Answer 1 · answered Oct 17 '18 at 07:40

First aggregate sum and reshape by unstack, then divide by div with sum:

df1 = df.groupby(['Farmer','Fruit','Ripe'], sort=False)['Number'].sum().unstack()

a = df1[1].div(df1[1].sum(level=0)).mul(100)
b = df1[1].div(df1.sum(axis=1)).mul(100)

keys = ('%(ripe fruit x)/(total ripe fruit)','%(ripe fruit x)/(total fruit x)')
df2 = pd.concat([a,b], axis=1, keys=keys).reset_index()
print (df2)
   Farmer   Fruit  %(ripe fruit x)/(total ripe fruit)  \
0  Sallys   Apple                                40.0   
1  Sallys   Grape                                45.0   
2  Sallys  Cherry                                15.0   
3    Tims  Cherry                               100.0   

   %(ripe fruit x)/(total fruit x)  
0                             80.0  
1                             90.0  
2                             30.0  
3                             30.0

hi @jezrael, thank you!! How did you decide to use .div() and create separate dataframes instead of creating a new column and using the division operator (something like df['new]=df['Ripe']/sum(whatever)) — Mariah Akinbi, Oct 17 '18 at 19:45
@MariahAkinbi - Answer is simply - `div` is more flexible like `/`, because is possible difine parameter s `axis=1`, `axis=0`. And also less `()` in code :) — jezrael, Oct 18 '18 at 05:09

pandas divide value by aggregated sum based on indicator

1 Answers1