I want to calculate an aggregated value from a data.table
and use it to replace the original values. I tried the following approach
library(data.table)
(DT <- data.table(grp = rep(LETTERS[1:3], 3),
val = 1:9,
val2 = rep(letters[24:26], each = 3)))
# grp val val2
# 1: A 1 x
# 2: B 2 x
# 3: C 3 x
# 4: A 4 y
# 5: B 5 y
# 6: C 6 y
# 7: A 7 z
# 8: B 8 z
# 9: C 9 z
(agg <- DT[, .SD[which.min(val)], grp])
# grp val val2
# 1: A 1 x
# 2: B 2 x
# 3: C 3 x
DT[, val3 := "New Value"]
agg[DT, on = "grp"][, .SD, .SDcols = !patterns("^i\\.")]
# grp val val2 val3
# 1: A 1 x New Value
# 2: B 2 x New Value
# 3: C 3 x New Value
# 4: A 1 x New Value
# 5: B 2 x New Value
# 6: C 3 x New Value
# 7: A 1 x New Value
# 8: B 2 x New Value
# 9: C 3 x New Value
While this approach works in this toy example I have a couple of concerns:
I would like to avoid chaining in the first place to avoid massive copying of unused columns (my real dataset is rather big
2e5 x 200
) Quote from data.table FAQ 1.10:We therefore strongly encourage X[Y, j] instead of X[Y]
Putting the
.SD
part into thej
slot in the first subset, won't workagg[DT, .SD, on = "grp", .SDcols = !patterns("^i\\.")] # Error in do_patterns(colsub, names_x) : Pattern not found: [^i\.]
- The
patterns
approach feels a bit hackish and relies on the assumption thatdata.table
will always prefix the columns fromY
withi.
. If the package maintainers would change that for whatever reason my code would break. - I am still learning
data.table
and I want to learn what is the most "idiomatic" way of solving this indata.table
How can I solve this problem while avoiding to make unnecessary copies to save resources?
Note. For what it matters: I do not want to change the values in DT
by reference.