I have a data table with a column for the starting_time
for an event. I need to duplicate the table for each minute between 7am
and the starting_time
.
I also need a new column adding for each duplicated data set, if there were 10 minutes between 7am
and the starting_time
, the data set would be duplicated 10 times. The first set would show 7:01am, the next set 7:02am etc with the final set being the same time as the starting_time
.
I am currently doing this with 2 loops but it very slow
dt <- data.table(id = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
time = c(rep("2020-07-23 12:00:00 BST", 3), rep("2020-07-23 12:10:00 BST", 3), rep("2020-07-23 12:20:00 BST", 3)),
h = c("a", "b", "c", "d", "e", "f", "g", "h", "i"))
s <- data.table(NULL)
fs <- data.table(NULL)
ids <- unique(dt$id)
start_time <- strptime(paste0(lubridate::ymd(Sys.Date()), " 11:50:00"), "%Y-%m-%d %H:%M:%S")
for (i in seq_along(ids)) {
d <- dt[id == ids[i]]
diff_minutes <- as.numeric(difftime(max(d$time), start_time, units='mins'))
for (j in 1:diff_minutes) {
ref_time <- start_time + (j * 60)
d[, time_stamp := ref_time]
d[, time_stamp_hour := substr(time_stamp, 12, 19)]
s <- rbindlist(list(s, d), use.names = T, fill = T)
}
fs <- rbindlist(list(fs, s))
s <- data.table(NULL)
}
The outer loop goes through each event since they all start at different times so the number of replications will be different. The inner loop is doing the duplicating and adding the new column.
This currently takes around a minute to run on the real data set which isn't a huge problem but it feels very inefficient.
Is there a more effecient way to do this using any of the functions within the data table package or without using loops?
Technically speaking the first data set should be the same as the start_time
so a 10 minute difference should result in 11
copies of the table but to keep it simple I just ignored it.