Say I have a data.table that looks as follows:
dt = data.table(group = c(1,1,1,2,2,2,3,3,3),time = c("2016-03-09T08:31:00-05:00","2016-03-08T11:31:00-05:00","2016-03-06T08:31:00-05:00",
"2016-04-04T23:28:00-04:00","2016-04-10T23:28:00-04:00","2016-04-09T23:28:00-04:00",
"2016-05-11T19:52:00-04:00","2016-05-10T20:52:00-04:00","2016-04-11T19:52:00-04:00"))
dt
group time
1: 1 2016-03-09T08:31:00-05:00
2: 1 2016-03-08T11:31:00-05:00
3: 1 2016-03-06T08:31:00-05:00
4: 2 2016-04-04T23:28:00-04:00
5: 2 2016-04-10T23:28:00-04:00
6: 2 2016-04-09T23:28:00-04:00
7: 3 2016-05-11T19:52:00-04:00
8: 3 2016-05-10T20:52:00-04:00
9: 3 2016-04-11T19:52:00-04:00
For each group in this data.table, I want to only retain the observations that are within 24 hours of the most recent date. I cooked up a nasty solution for this, but it's not nearly as fast as I need it to be on large datasets.
library(lubridate)
set(dt,j = "time",value = ymd_hms(dt[["time"]]))
dt[,.(mostRecent = max(time),time),by = group][
time > (mostRecent - days(1)),.(group,time)]
group time
1: 1 2016-03-09 13:31:00
2: 1 2016-03-08 16:31:00
3: 2 2016-04-11 03:28:00
4: 3 2016-05-11 23:52:00
5: 3 2016-05-11 00:52:00
Does anyone have tips on how to accomplish more elegantly/faster?