I wondered if you could help. I am trying to replicate this MS Access SQL query in R using data.table. The query is:
SELECT Tbl_X.code,
Min(Tbl_X.DateTime) AS MinOfDateTime,
Max(Tbl_X.EndDate) AS MaxOfEndDate
FROM Tbl_X
GROUP BY Tbl_X.code
HAVING (((Min(Tbl_X.DateTime)) < DateAdd("d",1,[Forms]![Frm_ControlPanel]![CalcDateFromForm])))
ORDER BY Min(Tbl_X.DateTime);
The HAVING
part is to filter out any observations on today's date.
My R code is:
library(data.table)
...
events <- as.data.table(event_data)[DateTime < max(DateTime),]
[order(DateTime),
.(Code,
Min_event_date = min(DateTime),
Max_validity_date = max(EndDate)),
by = Code]
I basically want a a single observation for each code that has the first date it's seen and the last date grouped by the code based on its first observation as the code can appear multiple times for every time its activated.
so from:
Code DateTime EndDate
A 2017-02-09 2017-04-09
A 2017-04-09 2017-06-09
A 2017-07-09 2017-09-09
B 2017-03-04 2017-05-11
B 2017-06-04 2017-08-13
To
Code DateTime EndDate
A 2017-02-09 2017-09-09
B 2017-03-04 2017-08-13
However, I am getting a slightly lower total figure that from the SQL. So I'm wondering if, firstly is what I am doing replicating the SQL and whether that is the most efficient way.