I have a data.table
containing time series of hourly observations from different locations (sites). There are gaps -- missing hours -- in each sequence. I want to fill out the sequence of hours for each site, so each sequence has a row for each hour (although data will be missing, NA).
Example data:
library(data.table)
library(lubridate)
DT <- data.table(site = rep(LETTERS[1:2], each = 3),
date = ymd_h(c("2017080101", "2017080103", "2017080105",
"2017080103", "2017080105", "2017080107")),
# x = c(1.1, 1.2, 1.3, 2.1, 2.2, 2.3, 3.1, 3.2, 3.3),
x = c(1.1, 1.2, 1.3, 2.1, 2.2, 2.3),
key = c("site", "date"))
DT
# site date x
# 1: A 2017-08-01 01:00:00 1.1
# 2: A 2017-08-01 03:00:00 1.2
# 3: A 2017-08-01 05:00:00 1.3
# 4: B 2017-08-01 03:00:00 2.1
# 5: B 2017-08-01 05:00:00 2.2
# 6: B 2017-08-01 07:00:00 2.3
The desired result DT2
would contain all the hours between the first (minimum) date and the last (maximum) date for each site, with x missing where the new rows are inserted:
# site date x
# 1: A 2017-08-01 01:00:00 1.1
# 2: A 2017-08-01 02:00:00 NA
# 3: A 2017-08-01 03:00:00 1.2
# 4: A 2017-08-01 04:00:00 NA
# 5: A 2017-08-01 05:00:00 1.3
# 6: B 2017-08-01 03:00:00 2.1
# 7: B 2017-08-01 04:00:00 NA
# 8: B 2017-08-01 05:00:00 2.2
# 9: B 2017-08-01 06:00:00 NA
#10: B 2017-08-01 07:00:00 2.3
I have tried to join DT
with a date sequence constructed from min(date)
and max(date)
. This is in the right direction, but the date range is over all sites rather than for each individual site, the filled in rows have missing site, and the sort order (key) is wrong:
DT[.(seq(from = min(date), to = max(date), by = "hour")),
.SD, on="date"]
# site date x
# 1: A 2017-08-01 01:00:00 1.1
# 2: NA 2017-08-01 02:00:00 NA
# 3: A 2017-08-01 03:00:00 1.2
# 4: B 2017-08-01 03:00:00 2.1
# 5: NA 2017-08-01 04:00:00 NA
# 6: A 2017-08-01 05:00:00 1.3
# 7: B 2017-08-01 05:00:00 2.2
# 8: NA 2017-08-01 06:00:00 NA
# 9: B 2017-08-01 07:00:00 2.3
So I naturally tried adding by = site
:
DT[.(seq(from = min(date), to = max(date), by = "hour")),
.SD, on="date", by=.(site)]
# site date x
# 1: A 2017-08-01 01:00:00 1.1
# 2: A 2017-08-01 03:00:00 1.2
# 3: A 2017-08-01 05:00:00 1.3
# 4: NA <NA> NA
# 5: B 2017-08-01 03:00:00 2.1
# 6: B 2017-08-01 05:00:00 2.2
# 7: B 2017-08-01 07:00:00 2.3
But this doesn't work either. Can anyone suggest the right data.table
formulation to give the desired filled-out DT2
shown above?