Retrieving Non-matches for Y (RIGHT JOIN) for Foverlaps in R using timestamps

Question

I am working on trying to do an merge on timestamp and another variable, specifically looking at how many ambulette pickups fall within a chain, the idea being that these are multiple pickups by the same ambulette in a chain considered the same overall trip, while those that are sole trips stand alone.

I've used sqldf for this but foverlaps seems much faster and more scalable so I would like to utilize this package or one like it. I am able to merge the two to find how many project start times fall within the chain times but I am not getting a return on the nonmatches in y.

Below is the code to reproduce, in this case with only one ambulette doenoted by the ID. I am using this link as a reference: Data Table merge based on date ranges

#example---
trips = data.table(
  "ambulette_id" = "1"
  ,"pickup" = as.POSIXct(c("2017-08-01 04:30:54", 
                        "2017-08-01 04:50:54", "2017-08-01 05:25:54", "2017-08-01 05:35:54", 
                        "2017-08-01 07:45:54", "2017-08-01 08:15:54", "2017-08-01 09:15:54", 
                        "2017-08-01 09:15:54", "2017-08-01 10:00:54", "2017-08-01 11:40:54", 
                        "2017-08-01 12:00:54", "2017-08-01 12:40:54"), tz = "GMT")
  ,"dropoff" = as.POSIXct(c("2017-08-01 05:00:59", 
                         "2017-08-01 05:00:59", "2017-08-01 05:55:59", "2017-08-01 05:55:59", 
                         "2017-08-01 08:35:59", "2017-08-01 08:35:59", "2017-08-01 09:30:59", 
                         "2017-08-01 09:45:59", "2017-08-01 10:30:59", "2017-08-01 11:50:59", 
                         "2017-08-01 12:15:59", "2017-08-01 13:05:59"), tz = "GMT")
  )[,pickup2:=pickup]


chains = data.table(
  "ambulette_id" = "1"
  ,"ambulette_chain_start" = as.POSIXct(c("2017-08-01 04:30:54", 
                            "2017-08-01 05:25:54", "2017-08-01 07:45:54", "2017-08-01 09:15:54"
  ), tz = "GMT")
  ,"ambulette_chain_end" = as.POSIXct(c("2017-08-01 05:00:59", "2017-08-01 05:55:59", 
                             "2017-08-01 08:35:59", "2017-08-01 09:45:59"),tz = "GMT")
  )



 #The final result is to merge trips on chains to get the pickups in trips that start in between the ranges in chains. Any pickups that don't match, should still show up as lone pickups, but instead foverlaps dumps them. Is there anyway to keep them?


setkey(trips,ambulette_id,pickup, pickup2)
final_join = foverlaps(chains
                       ,trips
                       ,by.x = c("ambulette_id", "ambulette_chain_start", "ambulette_chain_end"))[
                         ,pickup2:=NULL]

#test shows some trips not showing up in the final join
trips[!(pickup %in% final_join$pickup)]

Below is the SQL version that gets me the result I want:

#sqldf version
library(sqldf)
z = setDT(sqldf("SELECT 
               trips.ambulette_id, 
               trips.pickup,
               trips.dropoff,
               chains.ambulette_chain_start,
               chains.ambulette_chain_end
               FROM trips LEFT JOIN chains  
               ON trips.ambulette_id = chains.ambulette_id AND 
               pickup BETWEEN ambulette_chain_start AND ambulette_chain_end"))[
                 ,chained:=ifelse(is.na(ambulette_chain_start), "no", "yes")]

z

UPDATE:

The first response seems to answer almost all of the question, but I would like to keep the ambulette chain start and end columns from the merge so that the end product looks like the following. How would I do that?

ambulette_id              pickup             dropoff ambulette_chain_start ambulette_chain_end chained
 1:            1 2017-08-01 00:30:54 2017-08-01 01:00:59   2017-08-01 00:30:54 2017-08-01 01:00:59     yes
 2:            1 2017-08-01 00:50:54 2017-08-01 01:00:59   2017-08-01 00:30:54 2017-08-01 01:00:59     yes
 3:            1 2017-08-01 01:25:54 2017-08-01 01:55:59   2017-08-01 01:25:54 2017-08-01 01:55:59     yes
 4:            1 2017-08-01 01:35:54 2017-08-01 01:55:59   2017-08-01 01:25:54 2017-08-01 01:55:59     yes
 5:            1 2017-08-01 03:45:54 2017-08-01 04:35:59   2017-08-01 03:45:54 2017-08-01 04:35:59     yes
 6:            1 2017-08-01 04:15:54 2017-08-01 04:35:59   2017-08-01 03:45:54 2017-08-01 04:35:59     yes
 7:            1 2017-08-01 05:15:54 2017-08-01 05:30:59   2017-08-01 05:15:54 2017-08-01 05:45:59     yes
 8:            1 2017-08-01 05:15:54 2017-08-01 05:45:59   2017-08-01 05:15:54 2017-08-01 05:45:59     yes
 9:            1 2017-08-01 06:00:54 2017-08-01 06:30:59                  <NA>                <NA>      no
10:            1 2017-08-01 07:40:54 2017-08-01 07:50:59                  <NA>                <NA>      no
11:            1 2017-08-01 08:00:54 2017-08-01 08:15:59                  <NA>                <NA>      no
12:            1 2017-08-01 08:40:54 2017-08-01 09:05:59                  <NA>                <NA>      no

FURTHER UPDATE: implementing as suggested: 1. omitted the No assignment 2. added the chain assignments

trips[
  chains, on = .(ambulette_id, pickup > ambulette_chain_start,
                 pickup < ambulette_chain_end)
  ,':='(chained = 'yes'
        , ambulette_chain_start = ambulette_chain_start
        ,ambulette_chain_end = ambulette_chain_end)]

ambulette_id              pickup             dropoff             pickup2 chained ambulette_chain_start
 1:            1 2017-08-01 04:30:54 2017-08-01 05:00:59 2017-08-01 04:30:54      NA                  <NA>
 2:            1 2017-08-01 04:50:54 2017-08-01 05:00:59 2017-08-01 04:50:54     yes   2017-08-01 04:30:54
 3:            1 2017-08-01 05:25:54 2017-08-01 05:55:59 2017-08-01 05:25:54      NA                  <NA>
 4:            1 2017-08-01 05:35:54 2017-08-01 05:55:59 2017-08-01 05:35:54     yes   2017-08-01 05:25:54
 5:            1 2017-08-01 07:45:54 2017-08-01 08:35:59 2017-08-01 07:45:54      NA                  <NA>
 6:            1 2017-08-01 08:15:54 2017-08-01 08:35:59 2017-08-01 08:15:54     yes   2017-08-01 07:45:54
 7:            1 2017-08-01 09:15:54 2017-08-01 09:30:59 2017-08-01 09:15:54      NA                  <NA>
 8:            1 2017-08-01 09:15:54 2017-08-01 09:45:59 2017-08-01 09:15:54      NA                  <NA>
 9:            1 2017-08-01 10:00:54 2017-08-01 10:30:59 2017-08-01 10:00:54      NA                  <NA>
10:            1 2017-08-01 11:40:54 2017-08-01 11:50:59 2017-08-01 11:40:54      NA                  <NA>
11:            1 2017-08-01 12:00:54 2017-08-01 12:15:59 2017-08-01 12:00:54      NA                  <NA>
12:            1 2017-08-01 12:40:54 2017-08-01 13:05:59 2017-08-01 12:40:54      NA                  <NA>
    ambulette_chain_end
 1:                <NA>
 2: 2017-08-01 05:00:59
 3:                <NA>
 4: 2017-08-01 05:55:59
 5:                <NA>
 6: 2017-08-01 08:35:59
 7:                <NA>
 8:                <NA>
 9:                <NA>
10:                <NA>
11:                <NA>
12:                <NA>

I'm implementing incorrectly I assume, since the result is different than I am getting with sql or the previous solution

eddi · Accepted Answer · 2017-12-12T19:31:30.863

2

Here you go:

trips[, chained := 'no'][
      chains, on = .(ambulette_id, pickup >= ambulette_chain_start,
                                   pickup <= ambulette_chain_end)
      , `:=`(chained = 'yes',
             ambulette_chain_start = i.ambulette_chain_start,
             ambulette_chain_end = i.ambulette_chain_end)][]

edited Dec 12 '17 at 19:31

answered Dec 11 '17 at 22:36

eddi

49,088
6
104
155

Also how would I adjust the code to retain the the "ambulette_chain_start" and "ambulette_chain_end" variables? – LoF10 Dec 12 '17 at 14:58
by retain you mean add to trips? add it the same way as chained is added above. – eddi Dec 12 '17 at 16:48
I added an edit to the question so you can see what I mean. I tried just adding like you mentioned but I can't get it to work. – LoF10 Dec 12 '17 at 18:12
@LoF10 like I said, do the same thing as for `chained` above, just omit the initial 'no' assignment. I.e. replace `chained := 'yes'` with `':='(chained = 'yes', ambulette_chain_start = ambulette_chain_start, ...)` – eddi Dec 12 '17 at 18:23
I'm sorry eddi, I'm probably messing up the implementation, you may have to spell it out a bit more for me. I added an update with how I interpreted your directions. – LoF10 Dec 12 '17 at 18:38

Retrieving Non-matches for Y (RIGHT JOIN) for Foverlaps in R using timestamps

1 Answers1