I am doing a left non-equi join using data.table
:
OUTPUT <- DT2[DT1, on=.(DOB, FORENAME, SURNAME, POSTCODE, START_DATE <= MONTH, EXPIRY_DATE >= MONTH)]
The OUTPUT
contains a correct left join, with the exception that the MONTH
column (which is present in DT1) is missing.
Is this a bug in data.table
?
NB: Of course, START_DATE
, EXPIRY_DATE
and MONTH
are in the same YYYY-MM-DD, IDate format. The results of the join are correct based on these non-equi criteria. It is just that the column is missing and I need to use it in further work.
Edit 1: Simplified reproducible example
DT1 <- structure(list(ID = c(1, 2, 3), FORENAME = c("JOHN", "JACK",
"ROB"), SURNAME = c("JOHNSON", "JACKSON", "ROBINSON"), MONTH = structure(c(16953L,
16953L, 16953L), class = c("IDate", "Date"))), .Names = c("ID",
"FORENAME", "SURNAME", "MONTH"), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
DT2 <- structure(list(CERT_NUMBER = 999, FORENAME = "JOHN", SURNAME = "JOHNSON",
START_DATE = structure(16801L, class = c("IDate", "Date")),
EXPIRY_DATE = structure(17166L, class = c("IDate", "Date"
))), .Names = c("CERT_NUMBER", "FORENAME", "SURNAME", "START_DATE",
"EXPIRY_DATE"), row.names = c(NA, -1L), class = c("data.table",
"data.frame"))
OUTPUT <- DT2[DT1, on=.(FORENAME, SURNAME, START_DATE <= MONTH, EXPIRY_DATE >= MONTH)]
> OUTPUT
CERT_NUMBER FORENAME SURNAME START_DATE EXPIRY_DATE ID
1: 999 JOHN JOHNSON 2016-06-01 2016-06-01 1
2: NA JACK JACKSON 2016-06-01 2016-06-01 2
3: NA ROB ROBINSON 2016-06-01 2016-06-01 3
FORENAME
andSURNAME
are joined on and are present in the output.MONTH
is also (non-equi) joined on, and is absent from the output.
Why is this expected behaviour?
Even if it is expected behaviour, it is not helpful in my case, because I need to retain MONTH
for further data manipulation.
My expected output would be the same table, but with the MONTH
column retained as it is in DT1. After all, what I expect from a left join is for every row and column to be retained from the left table (DT1) and for all columns and only matched rows to be added from the right table (DT2).
CERT_NUMBER FORENAME SURNAME START_DATE EXPIRY_DATE ID MONTH
1: 999 JOHN JOHNSON 2016-01-01 2016-12-31 1 2016-06-01
2: NA JACK JACKSON <NA> <NA> 2 2016-06-01
3: NA ROB ROBINSON <NA> <NA> 3 2016-06-01
Edit 2: apparently in the output produced by my code, the START and END dates are also wrong! Only person 1 had a certificate with a start date on 1-Jan and end date on 31-Dec! The expected output is what it should be. But the actual output made everything 1-Jan.