5

I have a data frame with multiple date ranges (45 to be exact):

Range  Start       End
1      2014-01-01  2014-02-30
2      2015-01-10  2015-03-30
3      2016-04-20  2016-10-12
...    ...         ...

They will never overlap

I also have a data frame with various event dates (200K+):

Event  Date
1      2014-01-02
2      2014-03-20
3      2015-04-01
4      2016-08-18
...    ...

I want to test if these dates fall within any of these ranges:

Event  Date        InRange
1      2014-01-02  TRUE
2      2014-03-20  FALSE
3      2015-04-01  FALSE
4      2016-08-18  TRUE
...

What is the best way to perform this test? I have looked at lubridate's between and interval functions as well as various Stackoverflow questions, but cannot find a good solution.

smci
  • 32,567
  • 20
  • 113
  • 146
RUser
  • 588
  • 1
  • 4
  • 17

3 Answers3

6

You can create a vector of your date range from the first data frame, then use %in% operator to check if each date of your events is in this date range. Assuming your first data frame is dateRange, and second events, putting the above logic in one line would be:

events$InRange <- events$Date %in% unlist(Map(`:`, dateRange$Start, dateRange$End))

events
  Event       Date InRange
1     1 2014-01-02    TRUE
2     2 2014-03-20   FALSE
3     3 2015-04-01   FALSE
4     4 2016-08-18    TRUE

Where we used the Map to create the date range vector. Map combined with : operator create a list of date range from the Start to the End. Somewhere close to list(2014-01-01 : 2014-02-30, 2015-01-10 : 2015-03-30, 2016-04-20 : 2016-10-12 ...)(symbolically, not valid), with the unlist, we flatten it as a vector of date range which could then be used with %in% conveniently.

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Awesome one liner - Works a charm!!! Can you explain the unlist and Map part please? – RUser Jun 21 '16 at 01:08
  • 1
    Just updated the explanation. If it is still not clear, check the output of `Map(`:`, c(1,2), c(3,4))` and you will see how it works. – Psidom Jun 21 '16 at 01:15
4

Having ordered, non-overlapping intervals in your first "data.frame", you could test -for each event date- if it is above a $Start and its respective $End. Using findInterval to reduce relational comparisons and memory needed.

findInterval(events$Date, ranges$Start) > findInterval(events$Date, ranges$End)
#[1]  TRUE FALSE FALSE  TRUE

With data (modified "2014-02-30"):

ranges = structure(list(Range = 1:3, Start = structure(c(16071, 16445, 
16911), class = "Date"), End = structure(c(16129, 16524, 17086
), class = "Date")), .Names = c("Range", "Start", "End"), row.names = c(NA, 
-3L), class = "data.frame")

events = structure(list(Event = 1:4, Date = structure(c(16072, 16149, 
16526, 17031), class = "Date")), .Names = c("Event", "Date"), row.names = c(NA, 
-4L), class = "data.frame")
alexis_laz
  • 12,884
  • 4
  • 27
  • 37
1

Write your own function to check if a list of dates are in any of a number of intervals.

date.in <- function(x){
m <- NULL
for (i in 1:NROW(df)){m <- c(m,  ifelse(x>=df[i,1] & x<=df[i,2], TRUE, FALSE))}
any(m)}

Data:

df <- data.frame(start=c("2014-01-01", "2015-01-10", "2016-04-20"), 
       end=c("2014-02-30", "2015-03-30", "2016-10-12"))
df[] <- lapply(df, as.character)

s <- c("2014-01-02", "2014-03-20", "2015-04-01", "2016-08-18")

Test using string s.

as.character(lapply(s, date.in))#TRUE FALSE FALSE TRUE
milan
  • 4,782
  • 2
  • 21
  • 39