1

I am looking for a "data.table way" of doing the following join. The join is fine using sqldf, but it seems rather slow to me. I am wondering if there is a faster solution using data.table (or possibly something else, but in R).

`

sqldf("
  select a.*, 
  b.c from table1 as a 
  left join table2 as b
  on a.id=b.id
  and a.date >= b.date
  and a.date <= b.enddate;
")

`

table1 looks like this: `

id  date    a   b
100 1/31/1986   0.02    16100.02
100 2/28/1986   -0.26   11960.00
100 3/31/1986   0.37    16330.00
100 4/30/1986   -0.10   15172.00
100 5/30/1986   -0.22   11793.86
100 6/30/1986   -0.01   11734.59
100 7/31/1986   -0.08   10786.34
100 8/29/1986   -0.62   4148.59
100 9/30/1986   -0.06   3911.53
100 10/31/1986  -0.24   3002.34
100 11/28/1986  0.06    3182.48
100 12/31/1986  -0.38   1981.55
100 1/30/1987   -0.21   1581.53
100 2/27/1987   0.00    1581.53
100 3/31/1987   -0.38   973.25
100 4/30/1987   -0.06   912.42
100 5/29/1987   -0.07   851.59
100 6/30/1987   0.00    NA

`

table2: `

c   id  date    ndate
0   100 2/28/1986   2/28/1987
0.418   100 2/28/1987   2/28/1988

`

Andy
  • 295
  • 1
  • 8
  • What are the columns in the 2 tables? – Tim Biegeleisen Aug 06 '15 at 13:31
  • 1
    perhaps `?foverlaps` may be useful here. – akrun Aug 06 '15 at 13:41
  • The columns in table1 are: id, date, a, b. In table2, the columns are c, id, date, ndate. Not sure if I understood your question. – Andy Aug 06 '15 at 14:20
  • 2
    Check [this](http://stackoverflow.com/a/25655497/559784) or [this](http://stackoverflow.com/a/31829804/559784) answer using `foverlaps()`. And [this](http://stackoverflow.com/search?q=foverlaps) for all other answers on `foverlaps`. – Arun Aug 06 '15 at 18:54

1 Answers1

0

Use the dplyr package. It has a left_join() function. To use it to get the join you want I think you'd do the full join and then filter to remove those that don't meet your criteria. You just have to think about it a little differently than in SQL as you can't do it all at once:

left_join(table1, table2, by("id" = "id")) %>%
    filter(x.date < y.date & x.date > enddate) 

filter is also from the dplyr package.

%>% is from the magrittr package.

During the join, the function will prefix duplicate field names with x or y to denote which side of the join it originated from and avoid duplicate names.

Feargal Ryan
  • 454
  • 3
  • 9