0

I have a dataframe structured like the following:

example <- data.frame(id = c(1,1,1,1,1,1,1,2,2,2,2,2),
                      event = c("email","email","email","draw","email","email","draw","email","email","email","email","draw"),
                      date = c("2020-03-01","2020-06-01","2020-07-15","2020-07-28","2020-08-07","2020-09-01","2020-09-15","2020-05-22","2020-06-15","2020-07-13","2020-07-15","2020-07-31"))

I am trying to filter out the emails within each id that don't fall into a range of 30 days before a draw event signified in the event column. This is the result I would like to end up with:

desiredResult <- data.frame(id = c(1,1,1,1,2,2,2),
                      event = c("email","draw","email","draw","email","email","draw"),
                      date = c("2020-07-15","2020-07-28","2020-09-01","2020-09-15","2020-07-13","2020-07-15","2020-07-31"))

I just need to only include events that take place within 30 days before each draw event. I'm not sure how to achieve this

gizaom
  • 184
  • 8
  • 1
    I was thinking you can create another data frame that has `draw` events and the date that 30 days from those events. Then use non-equi join in `data.table` package to find emails falling in those dates. Take a look at these examples https://stackoverflow.com/a/56281648/786542 & https://stackoverflow.com/a/49531164/786542 – Tung Oct 16 '20 at 00:42

1 Answers1

1

In each id, we could select rows which are - 30 days from the event = "draw".

library(dplyr)

example %>%
  mutate(date = as.Date(date)) %>%
  group_by(id) %>%
  filter(Reduce(`|`, purrr::map(date[event == 'draw'],
                     ~between(date, .x - 30, .x))))

#     id event date      
#  <dbl> <chr> <date>    
#1     1 email 2020-07-15
#2     1 draw  2020-07-28
#3     1 email 2020-09-01
#4     1 draw  2020-09-15
#5     2 email 2020-07-13
#6     2 email 2020-07-15
#7     2 draw  2020-07-31
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213