I edited my question - please read the second part Edit 2
I need to join two dataset with the closet timestamp. The first datset is diary dataset from an mobile application:
df1 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T13:26:52Z", "2019-03-19T13:26:19Z",
"2019-03-19T13:23:46Z", "2019-03-19T13:22:20Z",
"2019-03-19T13:09:56Z", "2019-03-19T13:06:04Z", "2019-03-19T13:05:21Z",
"2019-03-19T13:04:37Z", "2019-03-19T12:47:28Z",
"2019-03-19T12:46:42Z"),
transport = c("Trainride", "Trainride", "Trainride", "Trainride",
"Trainride", "Trainride", "Trainride", "Trainride",
"Trainride", "Trainride"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
disc = c("start", "stop", "start", "stop", "start", "stop", "start",
"stop", "start", "stop")
)
# datetime dttr object
df1 <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime))
here:
datetime transport id disc
1 2019-03-19 13:26:52 Trainride 5-3 start
2 2019-03-19 13:26:19 Trainride 5-3 stop
3 2019-03-19 13:23:46 Trainride 5-3 start
4 2019-03-19 13:22:20 Trainride 5-3 stop
5 2019-03-19 13:09:56 Trainride 5-3 start
6 2019-03-19 13:06:04 Trainride 5-3 stop
7 2019-03-19 13:05:21 Trainride 5-3 start
8 2019-03-19 13:04:37 Trainride 5-3 stop
9 2019-03-19 12:47:28 Trainride 5-3 start
10 2019-03-19 12:46:42 Trainride 5-3 stop
and the second dataset is dataset from accelerometer log, showing movement(=INVH) or idle(=NIVH) :
df2 <- data.frame(stringsAsFactors=FALSE,
datetime = c("2019-03-19T23:20:00Z", "2019-03-19T23:17:30Z",
"2019-03-19T13:08:00Z", "2019-03-19T13:07:00Z",
"2019-03-19T12:38:45Z", "2019-03-19T12:32:45Z",
"2019-03-19T11:13:15Z", "2019-03-19T11:11:45Z", "2019-03-19T10:17:45Z",
"2019-03-19T10:16:45Z"),
id = c("5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3", "5-3",
"5-3"),
code = c("NIVH", "INVH", "NIVH", "INVH", "NIVH", "INVH", "NIVH",
"INVH", "NIVH", "INVH")
)
# datetime dttr object
df2 <- df2 %>%
mutate(datetime = lubridate::as_datetime(datetime))
here:
datetime id code
1 2019-03-19 23:20:00 5-3 NIVH
2 2019-03-19 23:17:30 5-3 INVH
3 2019-03-19 13:08:00 5-3 NIVH
4 2019-03-19 13:07:00 5-3 INVH
5 2019-03-19 12:38:45 5-3 NIVH
6 2019-03-19 12:32:45 5-3 INVH
7 2019-03-19 11:13:15 5-3 NIVH
8 2019-03-19 11:11:45 5-3 INVH
9 2019-03-19 10:17:45 5-3 NIVH
10 2019-03-19 10:16:45 5-3 INVH
I need to join two data frames based on time difference between timestamp fields. for example left join on df1 to see how app diary data is an agreement with real accelerometer log. simple left join will not work here becuase in most of the case there is a lag time. So my questen is how can i join these two dataset based on the record where the difference is minimum absolute difference.
EDIT 2 --- The solution suggested by @soren is helpful, however If I take start datetime and endtime to have a interval , will not work. any thought?
df1 <- df1 %>%
mutate(datetime = lubridate::as_datetime(datetime)) %>%
arrange(datetime) %>%
mutate(datetime_end = lead(datetime),
# Create an interval object.
Travel_Interval = lubridate::interval(start = datetime, end = datetime_end))