I am trying to perform a (simplified!) query on colour and date differences on the following two databases (extract):
A B
A.COL A.TIME B.COL B.TIME
1 blue 2009-01-31 1 blue 2007-01-31
2 blue 2009-02-28 2 blue 2008-12-31
3 blue 2009-03-31 3 blue 2009-02-28
4 blue 2009-04-30 4 blue 2009-04-30
5 blue 2009-05-31 5 blue 2009-06-30
6 blue 2009-06-30 6 blue 2016-08-31
7 blue 2016-03-31
8 blue 2016-04-30
9 red ...
10 red ...
What I want to do: Merge the tables based on COL as well as the difference in TIME, that is the difference betweeen both times must not be larger or lower than 2 month (or in other words lie between -2 and +2, depending from which date one starts with).
# For example starting with observation 1 from A, that would imply 2 matches:
2009-01-31 matched to 2008-12-31 (diff = 1)
2009-01-31 matched to 2009-02-28 (diff = -1)
# for obs 2 from A, that would imply
2009-02-28 matched to 2008-12-31 (diff = 2)
2009-02-28 matched to 2009-02-28 (diff = 0)
2009-02-28 matched to 2009-04-30 (diff = -2)
etc.
I was thinking about some kind of Date difference function, either from lubridate
, which was problematic in cases of months with less than 30 days and sometimes made NAs, or with as.yearmon
from zoo
, which worked to correctly calculate the differences at least. However, I wasn't able to implement this into sqldf
properly (Error: error in statement: near "as": syntax error). The reason seems to be that one cannot use every R function withing sqldf.
Any ideas how it can be done in R? I was also looking for an elegant way of how to subtract months from each other. There is this issue with lubridate:
Add/subtract 6 months (bond time) in R using lubridate, but here was one proposed way how to accomplish it with zoo
: Get the difference between dates in terms of weeks, months, quarters, and years
Get the data (thanks @bouncyball below for the code):
A <- read.table(
text = "
A.COL A.TIME
blue 2009-01-31
blue 2009-02-28
blue 2009-03-31
blue 2009-04-30
blue 2009-05-31
blue 2009-06-30
blue 2016-03-31
blue 2016-04-30
", header = T, stringsAsFactors = FALSE)
B <- read.table(
text = "
B.COL B.TIME
blue 2007-01-31
blue 2008-12-31
blue 2009-02-28
blue 2009-04-30
blue 2009-06-30
blue 2016-08-31
", stringsAsFactors = FALSE, header = T)