0

I want to be able to use a string containing an expression as an argument to a function for joining with data.table. See below;

> library(data.table)
> 
> x <- data.table(Id  = c("A", "B", "C", "C"),
+                 X1  = c(1L, 3L, 5L, 7L),
+                 X2 = c(8L,12L,9L,18L),
+                 XY  = c("x2", "x4", "x6", "x8"))
> 
> z <- data.table(ID = "C", Z1 = 5:9, Z2 = paste0("z", 5:9))
> 
> 
> x[z, on = .(Id == ID), mult = "all", allow.cartesian=TRUE]
    Id X1 X2 XY Z1 Z2
 1:  C  5  9 x6  5 z5
 2:  C  7 18 x8  5 z5
 3:  C  5  9 x6  6 z6
 4:  C  7 18 x8  6 z6
 5:  C  5  9 x6  7 z7
 6:  C  7 18 x8  7 z7
 7:  C  5  9 x6  8 z8
 8:  C  7 18 x8  8 z8
 9:  C  5  9 x6  9 z9
10:  C  7 18 x8  9 z9
> 
> # Conversion to a function. Need to work out the piece - evaluate_string_to_correct_format
> data_table_join <- function(x,y,string){
+     
+     x[y, on = evaluate_string_to_correct_format(string),
+       mult = "all", allow.cartesian = TRUE]
+     
+ }

Here string should equal to ".(Id == ID)" I am just not sure how to get to a point where it works. I want to be able to replace multiple parts of the data.table call with strings, so this is just a minimal example.

JFG123
  • 577
  • 5
  • 13

1 Answers1

1

Argument on can take character vector on input, not just expression. So you can just provide on="Id==ID". For multiple conditions use character vector of length 2+ like c("a==b","x==y"). Also you should find examples in manual of ?data.table useful.

x[z, on = "Id==ID", mult = "all", allow.cartesian=TRUE]

x[z, on = c("Id==ID", "X1==Z1"), mult = "all", allow.cartesian=TRUE]

If you don't want to use string but expression then you need to substitute it inside your function.

data_table_join <- function(x,y,expr) {
  eval(substitute(
    x[y, on = .expr,
      mult = "all", allow.cartesian = TRUE]
    list(.expr = expr)
  ))
}

This is how NSE works in R, read more on base R ?substitute manual or in R language manual, section Computing on the Language: https://cran.r-project.org/doc/manuals/R-lang.html#Computing-on-the-language

As for the parametrizing other parts of data.table queries you may find this Q useful: How can one work fully generically in data.table in R with column names in variables

jangorecki
  • 16,384
  • 4
  • 79
  • 160