I am looking for a way to interpolate as I am merging two data sets.
I have one data frame with readings for a number of different data loggers, and a second with spot measurements. I need to match up the hand measurements with the logger readings from the same time so I can compare them and compute off-sets. Unfortunately, the logger measurements are regularly spaced (hourly), while the hand measurements are not at the intervals.
I would like to interpolate the logger values to get the logger value for the reading, but am struggling on how to make sure I'm grabbing the correct measurement.
sample data
library(tidyverse)
Start.date <- "2019-12-18 00:00:00"
end.date <- "2019-12-20 00:00:00"
set.seed(100)
loggers <- tibble(
datetime = rep(seq.POSIXt(as.POSIXct(Start.date), as.POSIXct(end.date), by='1 hour'),4),
Site = rep(LETTERS[1:4], each = 49),
reading = c(rnorm(49, mean = 10, sd = 3),
rnorm(49, mean = 15, sd = 3),
rnorm(49, mean = 20, sd = 3),
rnorm(49, mean = 25, sd = 3)
)
)
hand_meas <- tibble(
Site = rep(LETTERS[1:4], each = 2),
datetime = as.POSIXct(rep(c("2019-12-18 12:35:00", "2019-12-19 13:45:00", "2019-12-18 12:55:00", "2019-12-19 13:15:00" ),2)),
meas = c(10, 11, 14, 16, 19, 19.2, 23, 24)
)
head(loggers)
# # A tibble: 6 x 3
# datetime Site reading
# <dttm> <chr> <dbl>
# 1 2019-12-18 00:00:00 A 7.65
# 2 2019-12-18 01:00:00 A 6.99
# 3 2019-12-18 02:00:00 A 13.8
# 4 2019-12-18 03:00:00 A 12.3
# 5 2019-12-18 04:00:00 A 11.6
# 6 2019-12-18 05:00:00 A 14.3
head(hand_meas)
# # A tibble: 6 x 3
# Site datetime meas
# <chr> <dttm> <dbl>
# 1 A 2019-12-18 12:35:00 10
# 2 A 2019-12-19 13:45:00 11
# 3 B 2019-12-18 12:55:00 14
# 4 B 2019-12-19 13:15:00 16
# 5 C 2019-12-18 12:35:00 19
# 6 C 2019-12-19 13:45:00 19.2
My typical approach would be to either left_join()
the logger data to the hand measurements, or to use approx()
to interpolate values, but neither of these work in this situation.
## This fails because it needs exact matches
left_join(hand_meas, loggers, by = c("Site", "datetime"))
# # A tibble: 8 x 4
# Site datetime meas reading
# <chr> <dttm> <dbl> <dbl>
# 1 A 2019-12-18 12:35:00 10 NA
# 2 A 2019-12-19 13:45:00 11 NA
# 3 B 2019-12-18 12:55:00 14 NA
# 4 B 2019-12-19 13:15:00 16 NA
# 5 C 2019-12-18 12:35:00 19 NA
# 6 C 2019-12-19 13:45:00 19.2 NA
# 7 D 2019-12-18 12:55:00 23 NA
# 8 D 2019-12-19 13:15:00 24 NA
## Succeeds, but does includes readings from all of the sites
approx(loggers$datetime, loggers$reading, hand_meas$datetime)
# $x
# [1] "2019-12-18 12:35:00 PST" "2019-12-19 13:45:00 PST" "2019-12-18 12:55:00 PST" "2019-12-19 13:15:00 PST"
# [5] "2019-12-18 12:35:00 PST" "2019-12-19 13:45:00 PST" "2019-12-18 12:55:00 PST" "2019-12-19 13:15:00 PST"
#
# $y
# [1] 17.67616 19.19072 17.75920 18.91207 17.67616 19.19072 17.75920 18.91207
#
# Warning message:
# In regularize.values(x, y, ties, missing(ties)) :
# collapsing to unique 'x' values
I could also use data.table to get the nearest logger value, but my real data has significant fluctuations over the course of the day, so need to interpolate the measurements from either side
# This is close, Using data.table to join based on nearest timestamp see question 31818444
# https://stackoverflow.com/questions/31818444/join-two-data-frames-in-r-based-on-closest-timestamp
library(data.table)
setDT(hand_meas)[, logger_reading := setDT(loggers)[hand_meas, reading, on = c("Site", "datetime"), roll = "nearest"]]
head(hand_meas)
# Site datetime meas logger_reading
# 1: A 2019-12-18 12:35:00 10.0 12.21952
# 2: A 2019-12-19 13:45:00 11.0 13.19621
# 3: B 2019-12-18 12:55:00 14.0 13.86335
# 4: B 2019-12-19 13:15:00 16.0 15.64910
# 5: C 2019-12-18 12:35:00 19.0 20.76380
# 6: C 2019-12-19 13:45:00 19.2 19.54722
Can anyone suggest a way to do something like approx()
while limiting the source data based on the site? Or a data.table approach that interpolates rather than strictly matching?