I am given a large data.table, e.g.
n <- 7
dt <- data.table(id_1=sample(1:10^(n-1),10^n,replace=TRUE), other=sample(letters[1:20],10^n,replace=TRUE), val=rnorm(10^n,mean=10^4,sd=1000))
> structure(dt)
id_1 other val
1: 914718 o 9623.078
2: 695164 f 10323.943
3: 53186 h 10930.825
4: 496575 p 9964.064
5: 474733 l 10759.779
---
9999996: 650001 p 9653.125
9999997: 225775 i 8945.636
9999998: 372827 d 8947.095
9999999: 268678 e 8371.433
10000000: 730810 i 10150.311
and I would like to create a data.table that for each value of the indicator id_1
only has one row, namely the one with the largest value in the column val
.
The following code seems to work:
dt[, .SD[which.max(val)], by = .(id_1)]
However, it is very slow for large tables. Is there a quicker way?