0

Before this is marked as a duplicate, I've looked at the following: question1 question2 source3

For each farmer, I'm trying to calculate two things: 1) the % of ripe fruit that is fruit x: %(ripe fruit x)/(total ripe fruit) 2) the % of ripe fruit of fruit x: %(ripe fruit x)/(total fruit x)

based on the ripe fruit indicator (1 for ripe and 0 for not ripe).

Input:

df = pd.DataFrame({'Farmer': ['Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Tims','Tims','Tims','Tims'],
                 'Fruit':['Apple','Apple','Apple','Grape','Grape','Grape','Grape','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry'],
                 'Type': ['Red','Yellow','Green','Red seedless','Red with seeds','Green','Purple','Montmorency','Morello','Bing','Rainer','Montmorency','Morello','Bing','Rainer'],
                 'Number':[2,6,2,1,1,6,2,3,1,3,3,3,1,3,3],
                 'Ripe':[1,1,0,1,0,1,1,0,0,0,1,0,0,0,1]})
df

    Farmer  Fruit   Number  Ripe    Type
0   Sallys  Apple   2        1      Red
1   Sallys  Apple   6        1      Yellow
2   Sallys  Apple   2        0      Green
3   Sallys  Grape   1        1      Red seedless
4   Sallys  Grape   1        0      Red with seeds
5   Sallys  Grape   6        1      Green
6   Sallys  Grape   2        1      Purple
7   Sallys  Cherry  3        0      Montmorency
8   Sallys  Cherry  1        0      Morello
9   Sallys  Cherry  3        0      Bing
10  Sallys  Cherry  3        1      Rainer
11  Tims    Cherry  3        0      Montmorency
12  Tims    Cherry  1        0      Morello
13  Tims    Cherry  3        0      Bing
14  Tims    Cherry  3        1      Rainer

Desired Output:

    Farmer  Fruit   %(ripe fruit x)/(total ripe fruit)  %(ripe fruit x)/(total fruit x)
0   Sallys  Apple   40                                  80
1   Sallys  Grape   45                                  90
2   Sallys  Cherry  15                                  30
3   Tims    Cherry  100                                 30
Mariah Akinbi
  • 386
  • 1
  • 5
  • 19

1 Answers1

2

First aggregate sum and reshape by unstack, then divide by div with sum:

df1 = df.groupby(['Farmer','Fruit','Ripe'], sort=False)['Number'].sum().unstack()

a = df1[1].div(df1[1].sum(level=0)).mul(100)
b = df1[1].div(df1.sum(axis=1)).mul(100)

keys = ('%(ripe fruit x)/(total ripe fruit)','%(ripe fruit x)/(total fruit x)')
df2 = pd.concat([a,b], axis=1, keys=keys).reset_index()
print (df2)
   Farmer   Fruit  %(ripe fruit x)/(total ripe fruit)  \
0  Sallys   Apple                                40.0   
1  Sallys   Grape                                45.0   
2  Sallys  Cherry                                15.0   
3    Tims  Cherry                               100.0   

   %(ripe fruit x)/(total fruit x)  
0                             80.0  
1                             90.0  
2                             30.0  
3                             30.0  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • hi @jezrael, thank you!! How did you decide to use .div() and create separate dataframes instead of creating a new column and using the division operator (something like df['new]=df['Ripe']/sum(whatever)) – Mariah Akinbi Oct 17 '18 at 19:45
  • 1
    @MariahAkinbi - Answer is simply - `div` is more flexible like `/`, because is possible difine parameter s `axis=1`, `axis=0`. And also less `()` in code :) – jezrael Oct 18 '18 at 05:09