Is there a way to self reference a data.table in i

Question

Consider the standard data.table syntax DT[i, j, ...]. Since .SD is only defined in j and NULL in i, is there any way to implicitly (desired) or explicitly (via something like .SD) refer to the current data.table in a function in i?

Use Case

I would like to write a function that filters standard columns. The column names are the same across multiple tables and somewhat verbose. To speed up my coding by less typing, I would like to write a function like this:

library(data.table)
dt <- data.table(postal_code   = c("USA123", "SPEEDO", "USA421"),
                 customer_name = c("Taylor", "Walker", "Thompson"))
dt
#>    postal_code customer_name
#> 1:      USA123        Taylor
#> 2:      SPEEDO        Walker
#> 3:      USA421      Thompson

# Filter all customers from a common postal code 
# that surname starts with specific letters
extract <- function(x, y, DT) {
  DT[, startsWith(postal_code, x) & startsWith(customer_name, y)]
}


# does not work
dt[extract("USA", "T", .SD)]
#> Error in .checkTypos(e, names_x): Object 'postal_code' not found.
#>    Perhaps you intended postal_code

# works but requires specifying the data.table explicitly
# plus the drawback that it cannot be called upon, e.g. a grouped .SD
# in a nested call
dt[extract("USA", "T", dt)]
#>    postal_code customer_name
#> 1:      USA123        Taylor
#> 2:      USA421      Thompson

Desired (pseudo code)

dt[extract("USA", "T")]
#>    postal_code customer_name
#> 1:      USA123        Taylor
#> 2:      USA421      Thompson

# but also
# subsequent steps in j
dt[extract("USA", "T"), relevant := TRUE][]
#>    postal_code customer_name relevant
#> 1:      USA123        Taylor     TRUE
#> 2:      SPEEDO        Walker       NA
#> 3:      USA421      Thompson     TRUE

# using other data.tables
another_dt[extract("USA", "T")]
yet_another_dt[extract("USA", "T")]

Seems like `fcase` can handle your second use-case: `dt[, relevant := fcase(extract("USA", "T", dt), TRUE, default = NA)][]`. Do you have other uses in mind that `fcase` wouldn't handle? — jblood94, Dec 08 '21 at 12:50
Thanks for your comment. I know that there are multiple ways to yield the desired result in `j`. However, I really would like to trigger everything in `i` since it is much more versatile and convenient. Often I first inspect the filtered rows and update them subsequently. Furthermore `dt[extract("USA", "T"), relevant := TRUE]` would be much clearer to read than `dt[, relevant := fcase(extract("USA", "T", dt), TRUE, default = NA)]`. It is not about "How can I get this result" but very specific to "How can I use such a function in `i`. — mnist, Dec 08 '21 at 12:58
It's admittedly not as readable, but wouldn't the approach in this answer give the desired versatility? https://stackoverflow.com/a/57091155/9463489 — jblood94, Dec 08 '21 at 14:03
Just for the record, I think this may a related (open) issue: [New symbol .D to refer to x in i](https://github.com/Rdatatable/data.table/issues/4685) — Henrik, Dec 08 '21 at 14:16
@jblood94 Not exactly since again I would have to type `dt` which I try to avoid. — mnist, Dec 08 '21 at 14:24

score 1 · Answer 1 · answered Dec 08 '21 at 11:10

1

I'm not a data.table expert but you can try the following workaround

> dt[,.SD[extract("USA", "T", .SD)]]
   postal_code customer_name
1:      USA123        Taylor
2:      USA421      Thompson

where you play self-reference at j within .SD

answered Dec 08 '21 at 11:10

ThomasIsCoding

96,636
9
24
81

Thank you for your answer. However, just filtering is not the only use case. I would also like to use it for all sorts of `i` actions, e.g. updating. Sorry for being not precise enough. I updated my question accordingly – mnist Dec 08 '21 at 11:27

score 0 · Answer 2 · answered Dec 08 '21 at 11:27

0

Here is a possible approach...

#create named vector
mystr <- c(postal_code = "USA", customer_name = "T")
#build query text
query <- paste0("grepl(\"^", mystr, "\", ", names(mystr), ")", collapse = " & ")
#eval/parse dynamic text
dt[eval(parse(text = query)), ]
#    postal_code customer_name
# 1:      USA123        Taylor
# 2:      USA421      Thompson

answered Dec 08 '21 at 11:27

Wimpel

26,031
1
20
37

Nice idea to use `eval(parse(...))`. However, is there any possibility to "hide" it inside of `extract()`? Using eval - parse in top-level code is undesirable. – mnist Dec 08 '21 at 11:33

Is there a way to self reference a data.table in i

Use Case

Desired (pseudo code)

2 Answers2