1

Let assume I have these two data.tables:

  A <- data.table(date = c("2003-05-24", "2003-06-05", "2003-06-24", "2003-06-25", "2003-06-27"),
                  "id" = c(1,2,1,1,2))

  B <- data.table(idd = c(1,1,1,1,1),
                  datee =  c("2003-05-25", "2003-06-06", "2003-06-25", "2003-06-26", "2003-06-28"),
                  value = c(1,2,3,4,5))
> A
         date id
1: 2003-05-24  1
2: 2003-06-05  2
3: 2003-06-24  1
4: 2003-06-25  1
5: 2003-06-27  2

> B
   idd      datee value
1:   1 2003-05-25     1
2:   1 2003-06-06     2
3:   1 2003-06-25     3
4:   1 2003-06-26     4
5:   1 2003-06-28     5

For each id in A, I want to join the closest (based on date) prior value from B. This gives the desired result:

A[B, value := i.value, on = c("id" = "idd", "date" = "datee"), roll=-Inf]

> A
         date id value
1: 2003-05-24  1    NA
2: 2003-06-05  2    NA
3: 2003-06-24  1     2
4: 2003-06-25  1     3
5: 2003-06-27  2    NA

The problem is, that I have a not just one column in B, but a few hundred. I really don't want to type out all those column names like valueXXX = i.valueXXX etc., especially since the number and names of he columns in B might change.

So I tried to do the rolling join like this:

C <- A[B, , on = c("id" = "idd", "date" = "datee"), roll=-Inf]

> C
         date id value
1: 2003-05-25  1     1
2: 2003-06-06  1     2
3: 2003-06-25  1     3
4: 2003-06-26  1     4
5: 2003-06-28  1     5

As you can see, the result is not at all, what I want. Can someone explain to me, why data.table is behaving like this? Also, what is the proper way of achieving my desired result without hardcoding all those column names?

EDIT: The link that Frank provided, solves my question indeed. Basically define a vector of the variables to be added and then use ":=" with mget:

vars <- c("value")  # in my case hundreds of variables, but in this toy example just one

A[B, (vars) := mget(paste0("i.", vars)), on = c("id" = "idd", "date" = "datee"), roll=-Inf]
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • 1
    I think this answers it...? https://stackoverflow.com/q/30468455/ – Frank Apr 23 '18 at 16:45
  • 1
    Thanks, @Frank! That solves my problem. Still I'm hoping to understand why A[B, , on = c("id" = "idd", "date" = "datee"), roll=-Inf] gives a totally different result. Just to get a better understanding of data.table. – user9621577 Apr 23 '18 at 16:54
  • 4
    @user9621577 I think you intended title / question you meant to put "rolling joins" and not "rolling joints". – steveb Apr 23 '18 at 17:27
  • 2
    Yeah, A[B] looks up each row of B in A and then returns the result, while A[B, v := i.v] edits A (using the last B row that matched each A row). It may be clearer if you lookup rows of A in B, like `A[, B[.SD, on=.(idd = id, datee = date), roll=TRUE]]`. In fact, I often do `A[, (cols) := B[.SD, on=.(idd = id, datee = date), roll=TRUE, ..cols]]` instead of the mget approach. There isn't yet a vignette on data.table joins, but I have some more notes about "update joins" here http://franknarf1.github.io/r-tutorial/_book/tables.html#joins-update – Frank Apr 23 '18 at 17:42
  • 1
    Here's another example of the `A[, v := B[.SD, ...]]` approach: https://stackoverflow.com/questions/42379658/find-time-to-nearest-occurrence-of-particular-value-for-each-row/42382399#42382399 – Frank Apr 23 '18 at 17:49
  • @Frank: Thanks for the explanation. And thanks for the link, it is very helpful. – user9621577 Apr 23 '18 at 22:58
  • 1
    @steveb: Hahaha, what a Freudian slip. Fixed it. – user9621577 Apr 23 '18 at 22:59

0 Answers0