Before this is marked as a duplicate, I've looked at the following: question1 question2 source3
For each farmer, I'm trying to calculate two things: 1) the % of ripe fruit that is fruit x: %(ripe fruit x)/(total ripe fruit) 2) the % of ripe fruit of fruit x: %(ripe fruit x)/(total fruit x)
based on the ripe fruit indicator (1 for ripe and 0 for not ripe).
Input:
df = pd.DataFrame({'Farmer': ['Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Tims','Tims','Tims','Tims'],
'Fruit':['Apple','Apple','Apple','Grape','Grape','Grape','Grape','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry'],
'Type': ['Red','Yellow','Green','Red seedless','Red with seeds','Green','Purple','Montmorency','Morello','Bing','Rainer','Montmorency','Morello','Bing','Rainer'],
'Number':[2,6,2,1,1,6,2,3,1,3,3,3,1,3,3],
'Ripe':[1,1,0,1,0,1,1,0,0,0,1,0,0,0,1]})
df
Farmer Fruit Number Ripe Type
0 Sallys Apple 2 1 Red
1 Sallys Apple 6 1 Yellow
2 Sallys Apple 2 0 Green
3 Sallys Grape 1 1 Red seedless
4 Sallys Grape 1 0 Red with seeds
5 Sallys Grape 6 1 Green
6 Sallys Grape 2 1 Purple
7 Sallys Cherry 3 0 Montmorency
8 Sallys Cherry 1 0 Morello
9 Sallys Cherry 3 0 Bing
10 Sallys Cherry 3 1 Rainer
11 Tims Cherry 3 0 Montmorency
12 Tims Cherry 1 0 Morello
13 Tims Cherry 3 0 Bing
14 Tims Cherry 3 1 Rainer
Desired Output:
Farmer Fruit %(ripe fruit x)/(total ripe fruit) %(ripe fruit x)/(total fruit x)
0 Sallys Apple 40 80
1 Sallys Grape 45 90
2 Sallys Cherry 15 30
3 Tims Cherry 100 30