This might have an easy solution, but I can't seem to crack it.
For an example, suppose I have a table listing purchases and customer details:
library(data.table)
purchase <- setDT(structure(list(Name = c("John", "John", "Mary"), Surname = c("Smith",
"Smith", "Jane"), PurchaseDate = c("2017-01-01", "2015-01-01",
"2017-01-02")), .Names = c("Name", "Surname", "PurchaseDate"), row.names = c(NA,
-3L), class = c("data.table", "data.frame")))
> purchase
Name Surname PurchaseDate
1: John Smith 2017-01-01
2: John Smith 2015-01-01
3: Mary Jane 2017-01-02
And I want to find if these customers held a valid discount card at a time of purchase, which matches data held in two databases:
df1 <- setDT(structure(list(Name = "John", Surname = "Smith", ValidFrom = "2016-12-31",
ValidTo = "2017-01-02"), .Names = c("Name", "Surname", "ValidFrom",
"ValidTo"), row.names = c(NA, -1L), class = c("data.table", "data.frame")))
df2 <- setDT(structure(list(Name = "Mary", Surname = "Jane", ValidFrom = "2017-01-01",
ValidTo = "2017-01-03"), .Names = c("Name", "Surname", "ValidFrom",
"ValidTo"), row.names = c(NA, -1L), class = c("data.table", "data.frame")))
> df1
Name Surname ValidFrom ValidTo
1: John Smith 2016-12-31 2017-01-02
> df2
Name Surname ValidFrom ValidTo
1: Mary Jane 2017-01-01 2017-01-03
I am adapting this solution, which uses data.table
library(data.table)
purchase[df1, on=c(Name='Name', Surname='Surname'), Match := 'Yes']
purchase[df2, on=c(Name='Name', Surname='Surname'), Match := 'Yes']
The results of this (based on left join) are saved to the Match
variable in the original purchase
table. (Importantly, this does not require creating a new object, but saves the result to the original object, otherwise it would become messy.)
> purchase
Name Surname PurchaseDate Match
1: John Smith 2017-01-01 Yes
2: John Smith 2015-01-01 Yes
3: Mary Jane 2017-01-02 Yes
But, I also need to check that the PurchaseDate
is within ValidFrom
and ValidTo
dates, and don't have a good idea of how to do this.
To do this, I can bring in the ValidFrom
and ValidTo
dates to the join and later determine whether the purchase was between those dates using ifelse
.
purchase[df1, on=c(Name='Name', Surname='Surname'), `:=`(Match='Yes', VFrom=ValidFrom, VTo=ValidTo)]
purchase[df2, on=c(Name='Name', Surname='Surname'), `:=`(Match='Yes', VFrom=ValidFrom, VTo=ValidTo)]
Great! This brings the dates:
Name Surname PurchaseDate Match VFrom VTo
1: John Smith 2017-01-01 Yes 2016-12-31 2017-01-02
2: John Smith 2015-01-01 Yes 2016-12-31 2017-01-02
3: Mary Jane 2017-01-02 Yes 2017-01-01 2017-01-03
But, the problem arises if customer had two discount cards, and a purchase falls in the period of validity of only one of them. Suppose Mary had two cards:
df2 <- setDT(structure(list(Name = structure(c(1L, 1L), .Label = "Mary", class = "factor"),
Surname = structure(c(1L, 1L), .Label = "Jane", class = "factor"),
ValidFrom = structure(1:2, .Label = c("2017-01-01", "1945-01-01"
), class = "factor"), ValidTo = structure(1:2, .Label = c("2017-01-03",
"1946-01-01"), class = "factor")), .Names = c("Name", "Surname",
"ValidFrom", "ValidTo"), row.names = c(NA, -2L), class = c("data.table", "data.frame")))
> df2
Name Surname ValidFrom ValidTo
1: Mary Jane 2017-01-01 2017-01-03
2: Mary Jane 1945-01-01 1946-01-01
Running this
purchase[df2, on=c(Name='Name', Surname='Surname'), `:=`(Match='Yes', VFrom=ValidFrom, VTo=ValidTo)]
Brings only one of these pairs of dates (apparently the earlies one, regardless of the row number).
Name Surname PurchaseDate Match VFrom VTo
1: John Smith 2017-01-01 Yes 2016-12-31 2017-01-02
2: John Smith 2015-01-01 Yes 2016-12-31 2017-01-02
3: Mary Jane 2017-01-02 Yes 1945-01-01 1946-01-01
How would I bring in all matched rows?
From what I learned, X[Y]
syntax supports appending to the original object (which I need), and also the :=
function, which I need, but does not support a full join. An alternative merge
supports full joins, but requires creation of new object at every join-step (will be extremely messy), and does not support :=
. Any ideas? Is there a way to use foverlaps
somehow?