1

I'd like to join two tables together where the join is based on a date in a date field falling in a particular (start_date, end_date) range. Is there a nice elegant way to do this in R (via dplyr or otherwise).

So that if for example I have:

x <- data.frame(
       Ref = c(1:5),
       Event_Date = as.Date(c("2014-03-31", "2013-12-01", "2016-07-13", "2017-04-13", "2015-02-01"))
)

y <- data.frame(
       Period_ID = c(2013:2016),
       Period_Start = as.Date(c("2013-01-01", "2014-01-01", "2015-01-01", "2016-01-01")),
       Period_End = as.Date(c("2013-12-31", "2014-12-31", "2015-12-31", "2016-12-31"))
)

Then I'd like to run a left join such that:

z = left_join(x, y, by = ???)

but where the match is not "Date = Lookup Date" but "Date falls in the range (Start_Date, End_Date)

In other words. what I'd like to see is:

enter image description here

Thank you.

Alan
  • 619
  • 6
  • 19

2 Answers2

3

This looks like a job for sqldf, which allows you to formulate your joins in SQL. (SQL has the language means to express your join conditions directly.)

library(sqldf)
z <- sqldf("SELECT * FROM x
            LEFT JOIN y
            ON  x.Event_Date BETWEEN y.Period_Start AND y.Period_End") %>%
  select( Ref, Event_Date, Period_ID )
z

Output:

  Ref Event_Date Period_ID
1   1 2014-03-31      2014
2   2 2013-12-01      2013
3   3 2016-07-13      2016
4   4 2017-04-13        NA
5   5 2015-02-01      2015
mrhd
  • 1,036
  • 7
  • 16
3

Using data.table, you can add the new column to x with an update join

library(data.table)
setDT(x)
setDT(y)

x[y, on = .(Event_Date >= Period_Start, Event_Date <= Period_End),
  Period_ID := i.Period_ID]

x
#    Ref Event_Date Period_ID
# 1:   1 2014-03-31      2014
# 2:   2 2013-12-01      2013
# 3:   3 2016-07-13      2016
# 4:   4 2017-04-13        NA
# 5:   5 2015-02-01      2015
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38