3

I have a data frame:

df<-data.frame(P = c("A","A","A", "B","B","B", "C", "C", "C"), 
               index = c("ind1","ind2","ind3","ind1","ind2","ind3","ind1","ind2","ind3"),
               var = c(2,1,1,8,5,4,2,8,6))

I would like to get ALL the minimum valueS of var and their associated index for each values of P. I can do this:

DT <- data.table(df)
DT[  ,.SD[which.min(var)], by = P]

which gives only one minimum value of var (the first one) by P:

   P index  var
1: A  ind2   1
2: B  ind3   4
3: C  ind1   2

And I would like:

   P index  var
1: A  ind2   1
2: A  ind3   1
2: B  ind3   4
3: C  ind1   2

Ideas?

DJack
  • 4,850
  • 3
  • 21
  • 45
  • from my understanding, you want the minimum values for each unique pairing of index and P... However, how is it that your desired output has two observations for P == 'A'? Yet, at the same time, by that same rule, you only have unique observations for P values of 'B' and 'C'. – Steven_ Dec 23 '15 at 15:39
  • 1
    http://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and-display-another-column this should help – TBSRounder Dec 23 '15 at 15:40

2 Answers2

5

Using dplyr, you could use one of the following:

library(dplyr)
DT %>% group_by(P) %>% filter(var == min(var))  # or %in% instead of ==
#Source: local data table [4 x 3]
#Groups: P
#
#       P  index   var
#  (fctr) (fctr) (dbl)
#1      A   ind2     1
#2      A   ind3     1
#3      B   ind3     4
#4      C   ind1     2

Or

DT %>% group_by(P) %>% top_n(1, desc(var)) # top_n() returns multiple rows in case of ties
#Source: local data table [4 x 3]
#Groups: P
#
#       P  index   var
#  (fctr) (fctr) (dbl)
#1      A   ind2     1
#2      A   ind3     1
#3      B   ind3     4
#4      C   ind1     2

Or

DT %>% group_by(P) %>% filter(min_rank(var) == 1)
#Source: local data table [4 x 3]
#Groups: P
#
#       P  index   var
#  (fctr) (fctr) (dbl)
#1      A   ind2     1
#2      A   ind3     1
#3      B   ind3     4
#4      C   ind1     2
talat
  • 68,970
  • 21
  • 126
  • 157
4

From the help page for which.min, you'll note that it says:

Determines the location, i.e., index of the (first) minimum or maximum of a numeric (or logical) vector.

If you wanted all values that match the minimum, you should try using ==. Thus, continuing with your approach, try:

DT[, .SD[var == min(var)], by = P]
##    P index var
## 1: A  ind2   1
## 2: A  ind3   1
## 3: B  ind3   4
## 4: C  ind1   2
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485