When I perform .SD operations using data.table, I often encounter situations in which it would be useful to access column name attributes within the lapply/.SD statement. Normally, situations like these arise when I need to perform a data.table operation which involves columns of an external data.table.
Say for instance that I have a data.table dt
with two columns. In addition, I have data.table mult
which serves as a "multiplication matrix", e.g. it contains factors by which I want the columns in dt
to be multiplied with.
dt = data.table(Val1 = rep(1,5), Val2 = rep(2,5))
mult = data.table(Val1 = 5, Val2 = 10)
> dt
Val1 Val2
1: 1 2
2: 1 2
3: 1 2
4: 1 2
5: 1 2
> mult
Val1 Val2
1: 5 10
In this elementary example, I want to multiply Val1
and Val2
in dt
with the respective multiplication factors in mult
. Using base R, the following statement could be applied using sapply
:
mat = sapply(colnames(dt), function(x){
dt[[x]] * mult[[x]]
})
> data.table(mat)
Val1 Val2
1: 5 20
2: 5 20
3: 5 20
4: 5 20
5: 5 20
Now, this works because sapply
is applied across the column names of dt
, not the columns themselves.
Say that I would like to perform the same operation using data.table/.SD. The issue here is that I cannot find a way of accessing the 'current' column name within the lapply statement considering that we iterate over the entire subset, not the names. Hence, I cannot index and source the appropriate multiplication factor from the mult
table from within the lapply statement.
The psuedocode of what I would like to do is below:
dt[, lapply(.SD, function(x){
# name = name of the iterated xth column in .SD, i.e. first 'Val1' and then 'Val2' )
# return(x*mult[[name]])
}), .SDcols = c('Val1', 'Val2')]
I am aware that there are workarounds available using expressive indexation in the lapply statement (i.e. lapply(1:ncol(dt)){...}
), but I would like to understand whether it is feasible using .SD instead.
Thank you in advance.