0

Consider the following:

DT <- data.table(a = sample(1:2), b = sample(1:100, 10), d = rnorm(10))

How to display the whole DT containing only the 3 highest values b for each level of a?

I found an approximate solution here:

> DT[order(-b), head(b, 3), a]
   a V1
1: 2 99
2: 2 83
3: 2 75
4: 1 96
5: 1 71
6: 1 67

However, it will only display the columns a and b (named as V1). I would like to obtain the same DT but with all the columns and the original columns names. Please consider that in practice my DT has many columns. I'm aiming for an elegant solution that doesn't requires to list all the columns names.

mat
  • 2,412
  • 5
  • 31
  • 69
  • see the second way and the third way in the answer of the post you mentioned. – mt1022 Nov 29 '19 at 11:47
  • @mt1022 I was looking for something more elegant, like the first answer based on `head()` – mat Nov 29 '19 at 11:50
  • 1
    You can get a simpler solution by modifying the one proposed by Frank: `DT[order(-b), .SD[1:.N<=3],"a"]` – mt1022 Nov 29 '19 at 11:55

1 Answers1

2

We can separate the calls and filter top 3 rows by group.

library(data.table)
DT[order(-b),head(.SD, 3),a]

#   a   b          d
#1: 1 100  1.4647474
#2: 1  61 -1.1250266
#3: 1  51  0.9435628
#4: 2  82  0.3302404
#5: 2  72 -0.0219803
#6: 2  55  1.6865777
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213