Let assume I have these two data.tables:
A <- data.table(date = c("2003-05-24", "2003-06-05", "2003-06-24", "2003-06-25", "2003-06-27"),
"id" = c(1,2,1,1,2))
B <- data.table(idd = c(1,1,1,1,1),
datee = c("2003-05-25", "2003-06-06", "2003-06-25", "2003-06-26", "2003-06-28"),
value = c(1,2,3,4,5))
> A
date id
1: 2003-05-24 1
2: 2003-06-05 2
3: 2003-06-24 1
4: 2003-06-25 1
5: 2003-06-27 2
> B
idd datee value
1: 1 2003-05-25 1
2: 1 2003-06-06 2
3: 1 2003-06-25 3
4: 1 2003-06-26 4
5: 1 2003-06-28 5
For each id in A, I want to join the closest (based on date) prior value from B. This gives the desired result:
A[B, value := i.value, on = c("id" = "idd", "date" = "datee"), roll=-Inf]
> A
date id value
1: 2003-05-24 1 NA
2: 2003-06-05 2 NA
3: 2003-06-24 1 2
4: 2003-06-25 1 3
5: 2003-06-27 2 NA
The problem is, that I have a not just one column in B, but a few hundred. I really don't want to type out all those column names like valueXXX = i.valueXXX etc., especially since the number and names of he columns in B might change.
So I tried to do the rolling join like this:
C <- A[B, , on = c("id" = "idd", "date" = "datee"), roll=-Inf]
> C
date id value
1: 2003-05-25 1 1
2: 2003-06-06 1 2
3: 2003-06-25 1 3
4: 2003-06-26 1 4
5: 2003-06-28 1 5
As you can see, the result is not at all, what I want. Can someone explain to me, why data.table is behaving like this? Also, what is the proper way of achieving my desired result without hardcoding all those column names?
EDIT: The link that Frank provided, solves my question indeed. Basically define a vector of the variables to be added and then use ":=" with mget:
vars <- c("value") # in my case hundreds of variables, but in this toy example just one
A[B, (vars) := mget(paste0("i.", vars)), on = c("id" = "idd", "date" = "datee"), roll=-Inf]