I have a dataset with ~ 100 mln rows, some kind of that DT
DT <- data.table(a = c(3,2,1,7,6,5),
b = c("1","1","1","2","2","2"),
c = c("2","2","2","3","3","3"),
d = c(5,6,7,8,9,0))
For select only rows with max value over group (b,c), I use
DT[DT[, .I[which.max(a)], by = list(b,c)]$V1]
which gives
a b c d
1: 3 1 2 5
2: 7 2 3 8
It works fine, but my question is maybe it's not a faster/optimal solution. Any advices are welcome!