We can also do this with .I
to return the row index and use that for subsetting the rows.
d[d[, .I[which.max(x==1)], by = a]$V1]
# a x y
#1: 1 1 2
#2: 2 0 1
#3: 3 1 1
In the current versions of data.table
, .I
approach is more efficient compared to the .SD
for subsetting rows (However, it could change in the future). This is also a similar post
Here is another option with order
(setkey
can also be used - for efficiency) the dataset by 'a' and 'x' after grouping by 'a', and then get the first row with head
d[order(a ,-x), head(.SD, 1) ,by = a]
# a x y
#1: 1 1 2
#2: 2 0 1
#3: 3 1 1
Benchmarks
Initially, we were thinking about benchmarking on > 1e6, but the .SD
methods are taking time, so comparing on 3e5
rows using data.table_1.9.7
set.seed(24)
d1 <- data.table(a = rep(1:1e5, 3), x = sample(0:1, 1e5*3,
replace=TRUE), y = rnorm(1e5*3))
system.time(d1[, .SD[which.max(x == 1)], by = a])
# user system elapsed
# 56.21 30.64 86.42
system.time(d1[, .SD[match(1L, x, nomatch = 1L)], by = a])
# user system elapsed
# 55.27 30.07 83.75
system.time(d1[d1[, .I[which.max(x==1)], by = a]$V1])
# user system elapsed
# 0.19 0.00 0.19
system.time(d1[order(a ,-x), head(.SD, 1) ,by = a])
# user system elapsed
# 0.03 0.00 0.04