0

I need to merge two tables by zip codes such that date ranges from table, ID_dates should contain the ranges in table, zipcodes.

Description of data:

Table ID_dates gives pre and post that covers the entire range of dates for ID_dates.
Combination of ID and pre gives a unique combination for the group within which the ranges in zip codes must lie. Ranges in ID_dates are continuous with no gaps. Zipcode is string but I am unable to have R read as a character in my example below (I am new to R so please forgive my mistakes in the code below,if any)

The ranges in zipcodes do not cover the range of pre and post. The ranges in zipcodes may lie outside of the pre and post dates in ID_dates and hence the start and end dates of zipcodes where this happens have to be truncated. For each group (combination of zip code and ID), the endpoints are pre and post.

zipcodes <- read.table(text="
zipcode  start         end      var1
08210   03/15/1992   03/29/1992 1
08210   06/29/1993   06/30/1993 2
07039   03/28/1992   03/31/1992 5
07046   06/6/1994    06/6/1994  1
", colClasses='character',header=T, stringsAsFactors=F)

zipcodes$start   <- as.Date(zipcodes$start,format = "%m/%d/%y")
zipcodes$end     <- as.Date(zipcodes$end,format = "%m/%d/%y")
    

ID_dates <- read.table(text="
ID  zipcode    t1       t2          pre       post    var2
A   08210   03/17/1992  03/28/1992  03/17/1992  06/01/1993  1
A   08210   03/29/1992  04/05/1992  03/17/1992  06/01/1993  2
A   07039   04/06/1992  06/01/1993  03/17/1992  06/01/1993  3
A   07046   07/07/1995  08/07/1995  07/07/1995  08/07/1995  5
B   90002   01/01/1996  02/01/1996  01/01/1996  02/01/1996  8
", colClasses='character',header=T, stringsAsFactors=F)

ID_dates$t1   <- as.Date(ID_dates$t1,format = "%m/%d/%y")
ID_dates$t2     <- as.Date(ID_dates$t2,format = "%m/%d/%y")
ID_dates$pre     <- as.Date(ID_dates$pre,format = "%m/%d/%y")
ID_dates$post     <- as.Date(ID_dates$post,format = "%m/%d/%y")



finaltable <- read.table(text="
id   zipcode   t1           t2     var1  var2
A   08210   03/17/1992  03/28/1992  1   1
A   08210   03/29/1992  04/05/1992  NA  2
A   07039   04/06/1992  06/01/1993  NA  3
A   07046   07/07/1995  08/07/1995  NA  5
B   90002   01/01/1996  02/01/1996  NA  8
  ", header=T, stringsAsFactors=F)

I am unable to write a code to achieve the above result.

Pam
  • 111
  • 6
  • @Ronak Thanks Ronak. I have fixed the errors. – Pam Aug 12 '20 at 13:49
  • @RonakShah I have made the edits. I too got the same error and the same code worked fine with other data frames. Not sure why... – Pam Aug 13 '20 at 13:58
  • I see. Thanks for the update. Have you tried answers here - https://stackoverflow.com/questions/24480031/overlap-join-with-start-and-end-positions This question looks very similar. – Ronak Shah Aug 13 '20 at 15:01

0 Answers0