1

I have two tables as follows:

table_A

x | date

1   03/06
1   03/15
2   12/04
3   06/23
3   10/05

table_B

x | y | start_date | end_date

1   a    03/02       03/08
1   b    03/09       03/20
1   c    03/21       12/30
2   j    01/08       12/10
3   i    06/21       07/30
3   h    07/31       12/30

I'd like to add the column 'y' to table_A, by first matching table_A's x with table_B's x by making sure the date is within table_B's start_date and end_date, then returning the corresponding y.

table_A

x | date | y

1   03/06  a
1   03/15  b
2   12/04  j
3   06/23  i
3   10/05  h

It'd be great if you could help me with this, thank you!

  • Can you specify the dates precisely, i.e., add the year? Otherwise it is not really clear why 03/06 falls into [03/02,03/08] but not into [01/08,12/10]. There are other multiple matches. – dipetkov Mar 06 '19 at 19:59
  • These questions might be helpful: https://stackoverflow.com/questions/49906499/r-join-dataframes-using-a-key-and-then-approximate-dates?noredirect=1&lq=1 and https://stackoverflow.com/questions/38542811/join-dataframes-by-id-and-overlapping-date-range – divibisan Mar 06 '19 at 21:42

1 Answers1

1

You will probably need as.Date accompanied with a time format that suits your purpose (format = "%m/%d"). You can then simply test if each specific date is within each time range, and return the matching output.

a <- read.table(text="
                x   date
                1   03/06
                1   03/15
                2   12/04
                3   06/23
                3   10/05", header=T, stringsAsFactors=F)

b <- read.table(text="
                x   y    start_date  end_date
                1   a    03/02       03/08
                1   b    03/09       03/20
                1   c    03/21       12/30
                2   j    01/08       12/10
                3   i    06/21       07/30
                3   h    07/31       12/30", header=T, stringsAsFactors=F)


out <- NULL
for (i in 1:nrow(a)){
  test <- (as.Date(a$date[i], format = "%m/%d") > as.Date(b$start_date, format = "%m/%d") &
             as.Date(a$date[i], format = "%m/%d") < as.Date(b$end_date, format = "%m/%d") & a$x[i]==b$x)
  out <- c(out, b$y[test])
}

a$y <- out
a

  x  date y
1 1 03/06 a
2 1 03/15 b
3 2 12/04 j
4 3 06/23 i
5 3 10/05 h
milan
  • 4,782
  • 2
  • 21
  • 39
  • Thank you so much for helping out! I don't think I was very clear in describing my problem, I just edited the post, it should make more sense now after the edit. I forgot to mention earlier that values in the x column may have overlapping start and end intervals, therefore we have to match the x values from table_A with table_B first, before finding the corresponding y. @milan – user8157539 Mar 06 '19 at 19:32
  • 1
    Updated based on changes made to the question. – milan Mar 07 '19 at 06:58