1

I have a follow-up question related to Calculate "group characteristics" without ddply and merge

I have a similar dataframe (per below), but trying to calculate the percentage of rotten fruits among the other fruits in the same category. I should hence not take into account whether the fruit in question itself is rotten. The dataframe per below hopefully clarifies this, the desired outcome column is purely inserted for example purposes.

Ideally i would like to use ddply (along the lines of ddply(df, .(Fruit), mutate, Perc = sum(Rotten)/length(Rotten)) ). However, i fail to find a way to find a way to only take into account the values of the other rows in the same group. I think i could use a combination of if-statements based on the values of the rows in question, but i wonder if there is a more elegant way of achieving this? Many thanks in advance, W

    Fruit Rotten Desired_Outcome
1   Apple      1            0.33
2   Apple      1            0.33
3   Apple      0            0.66
4   Apple      0            0.66
5    Pear      1            0.66
6    Pear      1            0.66
7    Pear      1            0.66
8    Pear      0            1.00
9  Cherry      0            0.00
10 Cherry      0            0.00
11 Cherry      0            0.00
12 Banana      1              NA

Fruit=c(rep("Apple",4),rep("Pear",4),rep("Cherry",3),"Banana")
Gender=c(rep("Male",3),rep("Female",3))
Rotten=c(1,1,0,0,1,1,1,0,0,0,0,1)
Desired_Outcome=c(0.33,0.33,0.66,0.66,0.66,0.66,0.66,1,0,0,0,NA)
df=data.frame(Fruit,Rotten,Desired_Outcome)     
df
Community
  • 1
  • 1
user1885116
  • 1,757
  • 4
  • 26
  • 39
  • Thanks so much. Somewhat feeling stupid that i didnt think of that. much appreciated for pointing this out to me. – user1885116 Apr 02 '13 at 14:44
  • The fact that you left _another_ comment on your question, rather than Justin's answer (I'm joran, btw, not Justin) leads me to believe that you really do need to take a moment an un-confuse yourself about some basic mechanics of how this site works. It will help you to get help, because at the moment Justin has no idea that you're trying to talk to him. – joran Apr 03 '13 at 02:16
  • Apologies - i aimed to delete comment immediately after writing it. appreciate the guidance - will read the FAQ – user1885116 Apr 03 '13 at 08:43

1 Answers1

5
ddply(df, 
      .(Fruit), 
      mutate, 
      Perc = (sum(Rotten) - Rotten)/(length(Rotten)-1))

mutate will work elementwise so you can subtract the value of each Row from your sum.

    Fruit Rotten Desired_Outcome      Perc
1   Apple      1            0.33 0.3333333
2   Apple      1            0.33 0.3333333
3   Apple      0            0.66 0.6666667
4   Apple      0            0.66 0.6666667
5  Banana      1              NA       NaN
6  Cherry      0            0.00 0.0000000
7  Cherry      0            0.00 0.0000000
8  Cherry      0            0.00 0.0000000
9    Pear      1            0.66 0.6666667
10   Pear      1            0.66 0.6666667
11   Pear      1            0.66 0.6666667
12   Pear      0            1.00 1.0000000
Justin
  • 42,475
  • 9
  • 93
  • 111
  • This is very useful solution. I came across when searching for solution to a very similar problem. Would you care to explain why the proposed syntax uses `.(Fruit)`? My initial thinking would be that syntax would be `.variables = c("Fruit")` as per manual suggesting on splitting `data.frame` in `ddply`. – Konrad Jun 29 '15 at 11:01
  • 1
    see the help for that `.` function `?plyr::.` It lets you "capture the name of variables, not their current value". – Justin Jun 30 '15 at 17:13