I want to use column names for an assignment by reference (:=) within a data.table. The function called is doing some calculation per row over several columns. I use the current development version of data.table (v1.9.7), which makes the parameter "with=TRUE" unnecessary.
A running minimal example with explicit variable names is:
DT <- data.table(a = 1:10, b = seq(2, 20, 2), c = seq(5, 50, 5))
DT[, out := sum(a, b), by = 1:nrow(DT)]
But if I have a lot of columns and I call the function with a single variable containing the (selected) column names, the code fails:
DT <- data.table(a = 1:10, b = seq(2, 20, 2))
col <- colnames(DT)
DT[, out := sum(col), by = 1:nrow(DT)]
EDIT:
David Arenburg's answer DT[, out := Reduce(
+, .SD), .SDcols = col]
works for this specific case. But I do not really understand how this approach can be applied to another function call. I wrote the following function to test:
myfun <- function(x, y, ...){
in.tmp1 <- x
in.tmp2 <- c(y, ...)
out.tmp <- in.tmp1 + mean(in.tmp2)
return(out.tmp)
}
Again, writing explicitly the column names the following approach works:
DT <- data.table(a = 1:10, b = seq(2, 20, 2), c = seq(5, 50, 5))
DT[, out := myfun(a,b,c), by = 1:nrow(DT)]
But I can't work out a more general solution for a large subset within the data.table specified by their columns names.