return rows with max/min value of column, by group, using plyr::ddply

Question

I found an answer (now deleted) to this question, and I'm curious why it doesn't work.

Question is: return the row corresponding to the minimum value, by group.

So for example, given the dataset:

df <- data.frame(State = c(rep('AK',4),rep('RI',4)),
                   Company = LETTERS[1:8],
                   Employees = c(82L, 104L, 37L, 24L, 19L, 118L, 88L, 42L))

...the correct answer is:

    State Company Employees
 1:    AK       D        24
 2:    RI       E        19

as can be obtained, for example, by

library(data.table); setDT(df)[ , .SD[which.min(Employees)], by = State]

My question is why this plyr::ddply command doesn't work:

library(plyr)
ddply(df, .(State), summarise, Employees=min(Employees), 
      Company=Company[which.min(Employees)])
# returns:
#   State Employees Company
# 1    AK        24       A
# 2    RI        19       E

In other words, why is which.min(Employees) returning 1 for each group, instead of c(4,1)? Note that outside of ddply, this works:

summarise(df, minEmp = min(Employees), whichMin = which.min(Employees))
#   minEmp whichMin
# 1     19        5

I don't use plyr much, but I'd like to know the right way to do it, if there's a reasonable one.

@hrbrmstr I saw you replied to my comment but then it disappeared -- just curious about what the right way to do it using `plyr` would be... — C8H10N4O2, Feb 07 '17 at 20:03

score 1 · Accepted Answer · answered Feb 07 '17 at 20:06

1

i'm getting the correct answer. not sure about your case..

library(plyr)
ddply(df, .(State), function(x) x[which.min(x$Employees),])
  State Company Employees
1    AK       D        24
2    RI       E        19

answered Feb 07 '17 at 20:06

joel.wilson

8,243
5
28
48

well that was simple enough, I'll accept when permitted – C8H10N4O2 Feb 07 '17 at 20:08
now it's giving yo the results? @C8H10N4O2 what was the problem then? – joel.wilson Feb 07 '17 at 20:13
there's no problem with your solution. it works. what I tried before (as described in question) didn't work. – C8H10N4O2 Feb 07 '17 at 20:15

return rows with max/min value of column, by group, using plyr::ddply

1 Answers1

Linked