Consider the standard data.table syntax DT[i, j, ...]
. Since .SD
is only defined in j
and NULL
in i
, is there any way to implicitly (desired) or explicitly (via something like .SD
) refer to the current data.table in a function in i
?
Use Case
I would like to write a function that filters standard columns. The column names are the same across multiple tables and somewhat verbose. To speed up my coding by less typing, I would like to write a function like this:
library(data.table)
dt <- data.table(postal_code = c("USA123", "SPEEDO", "USA421"),
customer_name = c("Taylor", "Walker", "Thompson"))
dt
#> postal_code customer_name
#> 1: USA123 Taylor
#> 2: SPEEDO Walker
#> 3: USA421 Thompson
# Filter all customers from a common postal code
# that surname starts with specific letters
extract <- function(x, y, DT) {
DT[, startsWith(postal_code, x) & startsWith(customer_name, y)]
}
# does not work
dt[extract("USA", "T", .SD)]
#> Error in .checkTypos(e, names_x): Object 'postal_code' not found.
#> Perhaps you intended postal_code
# works but requires specifying the data.table explicitly
# plus the drawback that it cannot be called upon, e.g. a grouped .SD
# in a nested call
dt[extract("USA", "T", dt)]
#> postal_code customer_name
#> 1: USA123 Taylor
#> 2: USA421 Thompson
Desired (pseudo code)
dt[extract("USA", "T")]
#> postal_code customer_name
#> 1: USA123 Taylor
#> 2: USA421 Thompson
# but also
# subsequent steps in j
dt[extract("USA", "T"), relevant := TRUE][]
#> postal_code customer_name relevant
#> 1: USA123 Taylor TRUE
#> 2: SPEEDO Walker NA
#> 3: USA421 Thompson TRUE
# using other data.tables
another_dt[extract("USA", "T")]
yet_another_dt[extract("USA", "T")]