0

How can I select x highest values for each group in data.table?

For example, I would like to take two highest values (Val) for each group (Date). So for this dataset:

Date    Name    Val
01/01/2010  A   3
01/01/2010  B   2
01/01/2010  C   1
02/01/2010  A   4
02/01/2010  B   2
02/01/2010  C   3
02/01/2010  D   1

code should return:

Date    Name    Val
01/01/2010  A   3
01/01/2010  B   2
02/01/2010  A   4
02/01/2010  C   3
Kulis
  • 988
  • 3
  • 11
  • 25
  • 2
    `DT[order(Val), tail(.SD, x), by=Date]`, I guess. – Frank Mar 05 '18 at 19:29
  • 1
    Related: [*Concise R data.table syntax for modal value (most frequent) by group*](https://stackoverflow.com/q/34403017/2204410) – Jaap Mar 05 '18 at 19:35

1 Answers1

1
df <- read.table(text = "Date    Name    Val
01/01/2010  A   3
                 01/01/2010  B   2
                 01/01/2010  C   1
                 02/01/2010  A   4
                 02/01/2010  B   2
                 02/01/2010  C   3
                 02/01/2010  D   1", 
                 header = TRUE, stringsAsFactors = FALSE)

setDT(df)
df[, max_val := max(Val), by = Date]
df[, max_sec := order(Val, decreasing = T)[2], by = Date]
df <- df[Val == max_val | Val == max_sec, ]
df[, c("max_val", "max_sec") := NULL]

         Date Name Val
1: 01/01/2010    A   3
2: 01/01/2010    B   2
3: 02/01/2010    A   4
4: 02/01/2010    C   3
sm925
  • 2,648
  • 1
  • 16
  • 28