Find closest date in a df to a date in another df and subtract the difference

Question

I have 2 dataframes:

 df1<-structure(list(id = c(1, 2), date = structure(c(1483636800, 1485192000
), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA, 
-2L))

    df2<-structure(list(id.1 = structure(1:3, .Label = c("A", "B", "C"
), class = "factor"), sunrise = structure(c(1483617946, 1485198384, 
1485205584), class = c("POSIXct", "POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA, 
-3L))

df1:

  id                date
1  1 2017-01-05 12:20:00
2  2 2017-01-23 12:20:00

df2:

  id.1             sunrise
1    A 2017-01-05 07:05:46
2    B 2017-01-23 14:06:24
3    C 2017-01-23 16:06:24

I would like to find the closest sunrise date in df2 to each of the dates in df1, and then calculate the time difference (in hours) and put these new values in new column "closest" in df1. To find the time until sunrise, whereby a negative value would indicate a period after sunrise and a positive value a period before sunrise.

My df1 is very large > 100 million rows, so it's important that the solution works fast and efficient. The solution I found cannot accommodate the size of my data set, not even on a single column (will return: "Error: vector memory exhausted (limit reached?)". Attempts to remedy this issue were so far unsuccessful.

sDTA <- data.table(df1)[2]
rDTB <- data.table(df2)
try1<-sDTA[, closest := rDTB[sDTA, on = .(sunrise = date), roll = "nearest", x.sunrise]][]
        
try1$Time_until_sunrise<difftime(try1$closest,try1$date,units="hours")

I previously encountered exhausted memory issues while trying to use aggregate() functions, but was able to 'repair' this by replacing these with those that used dplyr. Perhaps this could be a solution again.

Is it assumed that df1 and df2 are sorted by date? – Bill O'Brien Jul 09 '21 at 15:56 — Bill O'Brien, Jul 09 '21 at 15:56
@BillO'Brien Not df1, but could be. – FlyingDutch Jul 09 '21 at 16:17 — FlyingDutch, Jul 09 '21 at 16:17
findInterval() in base R might be useful – Bill O'Brien Jul 09 '21 at 16:18 — Bill O'Brien, Jul 09 '21 at 16:18

Find closest date in a df to a date in another df and subtract the difference

0 Answers0