I want to apply a transformation (whose type, loosely speaking, is "vector" -> "vector") to a list of columns in a data table, and this transformation will involve a grouping operation.
Here is the setup and what I would like to achieve:
library(data.table)
set.seed(123)
n <- 1000
DT <- data.table(
date = seq.Date(as.Date('2000/1/1'), by='day', length.out = n),
A = runif(n),
B = rnorm(n),
C = rexp(n))
DT[, A.prime := (A - mean(A))/sd(A), by=year(date)]
DT[, B.prime := (B - mean(B))/sd(B), by=year(date)]
DT[, C.prime := (C - mean(C))/sd(C), by=year(date)]
The goal is to avoid typing out the column names. In my actual application, I have a list of columns I would like to apply this transformation to.
library(data.table)
set.seed(123)
n <- 1000
DT <- data.table(
date = seq.Date(as.Date('2000/1/1'), by='day', length.out = n),
A = runif(n),
B = rnorm(n),
C = rexp(n))
columns <- c("A", "B", "C")
for (x in columns) {
# This doesn't work.
# target <- DT[, (x - mean(x, na.rm=TRUE))/sd(x, na.rm = TRUE), by=year(date)]
# This doesn't work.
#target <- DT[, (..x - mean(..x, na.rm=TRUE))/sd(..x, na.rm = TRUE), by=year(date)]
# THIS WORKS! But it is tedious writing "get(x)" every time.
target <- DT[, (get(x) - mean(get(x), na.rm=TRUE))/sd(get(x), na.rm = TRUE), by=year(date)][, V1]
set(DT, j = paste0(x, ".prime"), value = target)
}
Question: What is the idiomatic way to achieve the above result? There are two things which may be possibly be improved:
- How to avoid typing out
get(x)
every time I usex
to access a column? - Is accessing
[, V1]
the most efficient way of doing this? Is it possible to updateDT
directly by reference, without creating an intermediate data.table?