1

I have two data frames I would like to merge based on two variables: UserID & date <= 30 . I have the two data frames below:

df1 <- structure(list(UserID = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L), 
                      Full.Name = c( "John Smith", "Jack Peters", "Bob Brown", "Jane Doe", "Jackie Jane", "Sarah Brown", "Chloe Brown", "John Smith" ), 
                      Info = c("yes", "no", "yes", "yes", "yes", "yes", "no", "yes"), 
                      EncounterID = c(13L, 14L, 15L, 16L, 17L, 18L, 19L, 13L), DateTime = c("1/2/21 00:00", "1/5/21 12:00", "1/1/21 1:31", "1/5/21 3:34", "5/9/21 5:33", "5/8/21 3:39", "12/12/21 2:30", "12/11/21 9:21"), 
                      Temp = c("100", "103", "104", "103", "101", "102", "103", "105"), 
 
                      misc = c("(null)", "no", "(null)", "(null)", "(null)","(null)", "(null)", "(null)" 
                                    )), 
                 class = "data.frame", row.names = c(NA, 
                                                     -8L))

df2 <- structure(list(UserID = c(1L, 2L, 3L, 4L, 5L, 6L), 
                      Full.Name = c("John Smith", "Jack Peters", "Bob Brown", "Jane Doe", "Jackie Jane", "Sarah Brown"), 
                      DOB = c("1/1/90", "1/10/90", "1/2/90", "2/20/80", "2/2/80", "12/2/80"), 
                      EncounterID = c(13L, 14L, 15L, 16L, 17L, 18L), EncounterDate = c("1/1/21", "1/2/21", "1/1/21", "1/6/21", "5/7/21", "5/8/21"), 
                      Type = c("Intro", "Intro", "Intro", "Intro", "Care", "Out"), 
                      responses = c("(null)", "no", 
                                    "yes", "no", "no", "unsat")), 
                 class = "data.frame", row.names = c(NA, 
                                                     -6L))

As long as the UserID and the two dates (DateTime in df1 and EncounterDate in df2) are within 30 days from each other, they would be merged. In df1, we can see that there are two records of John Smith with the same UserID but only one of those would be merged because there's only one record that's within 30 days.

Ash S
  • 119
  • 5
  • 3
    (1) Your "date" columns are poorly-formatted strings, you need to convert them to `POSIXct`-class (timestamps) vectors using `as.POSIXct(paste0(DateTime,"-00"), format="%m/%d/%Y %H:%M:%S")` or such. (2) This is a not-uncommon question on SO: this is a range-based join (aka non-equi join), resolvable using `sqldf`, `fuzzyjoin`, or `data.table`. See https://stackoverflow.com/q/64362881/3358272 – r2evans Nov 01 '21 at 01:14
  • 1
    @r2evans. I thought your comment was very useful and more informed than my idle and silent reaction to it when I saw it yesterday. After looking at it I also went searching on "[r] fuzzy join criteria" and found what I thought was an even "better" duplicate at least in the sense of including setting a criterion based aon a specific range for date differences. – IRTFM Nov 01 '21 at 15:24
  • Thanks @IRTFM, the other link you provided is another good reference. I was admittedly in a bit of a rush when I posted that comment and only found an answer that I had previously done. – r2evans Nov 01 '21 at 18:15

0 Answers0