I am working with the R Programming Language. I have the following tables (note: all variables appear as "Factors"):
table_1 = data.frame(id = c("123", "123", "125", "C125-B"),
date_1 = c("2010-01-31","2010-01-31", "2016-01-31", "2018-01-31" ))
table_1$id = as.factor(table_1$id)
table_1$date_1 = as.factor(table_1$date_1)
table_2 = data.frame(id = c("5123", "123 A", "125", "125"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))
table_2$id = as.factor(table_2$id)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)
> table_1
id date_1
1 123 2010-01-31
2 123 2010-01-31
3 125 2016-01-31
4 C125-B 2018-01-31
table_2
id date_2 date_3
1 5123 2009-01-31 2011-01-31
2 123 A 2010-01-31 2010-01-31
3 125 2010-01-31 2020-01-31
4 125 2010-01-31 2020-01-31
I am trying to "join" (e.g. inner join) this tables on the following conditions:
1) if table_1$id "fuzzy equal" table_2$id
AND
2) if table_1$date BETWEEN(table_2$date_2,table_2$date_3)
I tried to write the following code in R to do this:
library(fuzzyjoin)
stringdist_inner_join(table_1, table_2,
by ="id", distance_col = NULL)
Question: But I am not sure if the stringdist_inner_join
function can accommodate this kind of "between" logic.
Can someone please show me how to do this? Are there any other methods to accomplish this in R?
Thanks!