7

I want to do a non-equi join between two data.tables (example to follow). Is there a way to rename the join-by variables in the on argument?

I tried the following but it does not produce the expected column names (initial_date and final_date). I know I can use data.table::setnames() to rename the variables.


library(data.table)

dt1 <- data.table(id = c("a", "b"), 
                  initial_date = as.Date(c("2019-01-01", "2019-02-01")),
                  final_date = as.Date(c("2019-02-01", "2019-03-01")))
dt2 <- data.table(id = c("a", "a", "b", "b"), 
                  x = c(1, 2, 3, 4),
                  date = as.Date(c("2019-01-01", "2019-01-04", "2019-02-03", "2019-02-19")))

dt2[dt1, 
    .(x = sum(x)),
    on = .(id, initial_date = date >= initial_date, final_date = date <= final_date), 
    by = .EACHI]
#>    id       date       date x
#> 1:  a 2019-01-01 2019-02-01 3
#> 2:  b 2019-02-01 2019-03-01 7

Created on 2019-12-26 by the reprex package (v0.3.0)

Giovanni Colitti
  • 1,982
  • 11
  • 24
  • 2
    There are proposals for names to come from dt1 https://github.com/Rdatatable/data.table/issues/1700 but none to manually choose names with x = y >= z. – Frank 2 Dec 26 '19 at 19:15
  • 2
    You could specify the columns you want and then remove the ones you don't after the join, not really ideal though. `dt2[dt1, on = .(id, initial_date = date >= initial_date, final_date = date <= final_date), by = .EACHI, .(x = sum(x), initial_date, final_date)][, c(names(dt1), "x"), with = FALSE]` – Matt Dec 26 '19 at 20:44
  • 2
    another option is to copy the columns before the join `dt2[, c("initial_date", "final_date") := .(date, date)][dt1, .(x = sum(x)), on = .(id, initial_date >= initial_date, final_date <= final_date), by = .EACHI]` – chinsoon12 Dec 27 '19 at 00:32

0 Answers0