1

I've seen various solutions for this question based on date only, but the time component is tripping me up. I have two data frames with POSIXct columns called 'datetime'. For DF1 that column has data rounded to the nearest hour. For DF2, the time component is not rounded to the nearest hour and can occur anytime. The dataframes look like this:

DF1

datetime X Y Z
2020-09-01 03:00:00 1 3 4
2020-09-02 12:00:00 12 3 5
2020-09-02 22:00:00 4 9 19
2020-09-03 01:00:00 4 10 2
2020-09-04 06:00:00 4 12 1
2020-09-04 08:00:00 11 13 10

DF2

datetime Var
2020-09-01 02:23:14 A
2020-09-01 03:12:09 B
2020-09-02 11:52:15 A
2020-09-02 12:15:44 B
2020-09-02 22:31:56 A
2020-09-02 21:38:05 B
2020-09-03 01:11:39 A
2020-09-03 00:59:33 B
2020-09-04 05:12:19 A
2020-09-04 06:07:09 B
2020-09-04 08:22:28 A
2020-09-04 07:50:17 B

What I want is to merge these two dataframes based on this column using the date and time that are closest in time to 'datetime' in DF1, so that it looks like this:

datetime X Y Z Var
2020-09-01 03:00:00 1 3 4 B
2020-09-02 12:00:00 12 3 5 A
2020-09-02 22:00:00 4 9 19 B
2020-09-03 01:00:00 4 10 2 B
2020-09-04 06:00:00 4 12 1 B
2020-09-04 08:00:00 11 13 10 B

Thank you!

Jacob
  • 329
  • 2
  • 10
  • I apologize, but I was able to hunt down a previous question that was very similar and provided a useable answer: https://stackoverflow.com/questions/39282749/r-how-to-join-two-data-frames-by-nearest-time-date – Jacob Feb 03 '22 at 16:46

1 Answers1

2

Adding helper columns for merge and group_by, using merge and then dplyr for the filtering

library(dplyr)

df1$tmp <- as.Date(df1$datetime)
df2$tmp <- as.Date(df2$datetime)

df1$grp <- 1:(nrow(df1))

merge(df1, df2, "tmp") %>% 
  group_by(grp) %>% 
  slice(which.min(abs(difftime(datetime.x, datetime.y)))) %>% 
  ungroup() %>% 
  select(-c(tmp,grp,datetime.y))
# A tibble: 6 × 5
  datetime.x              X     Y     Z Var  
  <chr>               <int> <int> <int> <chr>
1 2020-09-01 03:00:00     1     3     4 B    
2 2020-09-02 12:00:00    12     3     5 A    
3 2020-09-02 22:00:00     4     9    19 B    
4 2020-09-03 01:00:00     4    10     2 B    
5 2020-09-04 06:00:00     4    12     1 B    
6 2020-09-04 08:00:00    11    13    10 B

Data

df1 <- structure(list(datetime = c("2020-09-01 03:00:00", "2020-09-02 12:00:00", 
"2020-09-02 22:00:00", "2020-09-03 01:00:00", "2020-09-04 06:00:00", 
"2020-09-04 08:00:00"), X = c(1L, 12L, 4L, 4L, 4L, 11L), Y = c(3L, 
3L, 9L, 10L, 12L, 13L), Z = c(4L, 5L, 19L, 2L, 1L, 10L)), class = "data.frame", row.names = c(NA, 
-6L))

df2 <- structure(list(datetime = c("2020-09-01 02:23:14", "2020-09-01 03:12:09", 
"2020-09-02 11:52:15", "2020-09-02 12:15:44", "2020-09-02 22:31:56", 
"2020-09-02 21:38:05", "2020-09-03 01:11:39", "2020-09-03 00:59:33", 
"2020-09-04 05:12:19", "2020-09-04 06:07:09", "2020-09-04 08:22:28", 
"2020-09-04 07:50:17"), Var = c("A", "B", "A", "B", "A", "B", 
"A", "B", "A", "B", "A", "B")), class = "data.frame", row.names = c(NA, 
-12L))
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
  • This solution works just fine, but is a bit convoluted compared to the answer above which I finally found. Thank you for the effort and sorry for the time waste. – Jacob Feb 03 '22 at 16:49
  • 2
    @Jacob K, no worries. If you check all the answers, we now have a `data.table` solution, a self made merge function and this `dplyr` approach. So fortunately no redundancy here :) – Andre Wildberg Feb 03 '22 at 16:55