I am wondering if there is any way to use grouping in data.tables to select rows from within each group. The easiest way to explain what I'm looking for is by way of a minimal example.
1) A toy data.table to work with:
set.seed(123)
mydt = data.table(
grp = rep(LETTERS[1:5], each = 20),
site = rep(letters[1:20], 5),
V1 = rnorm(100))
2) This doesn't work:
Now suppose we want to extract the row from each group of grp
for which V1 is the smallest. The naiive approach below fails for obvious reasons:
mydt[which.min(V1), .(site, V1), by=grp]
# grp site V1
# 1: D l -2.309169
This only returns a single value (the smallest V1 in the whole table), because the subsetting on i
is conducted prior to the looping over by
. Whereas what we really want is the smallest row within each group.
3) This does work, but is hacky:
res = data.table()
for (mygrp in unique(mydt[,grp])) {
smallest = mydt[grp == mygrp, which.min(V1)]
res = rbind(res, mydt[grp == mygrp][smallest])
}
# grp site V1
# 1: A r -1.966617
# 2: B f -1.686693
# 3: C q -1.548753
# 4: D l -2.309169
# 5: E t -1.026421
4) Is there a better way?
Is there a more canonical data.table style apporach to doing this?