I have two data.tables - one with location snapshots (df2
), and one with times when rides ended (df1
). My goal is to join the very large df2
on df1
where the times match closest and the bike_id
s match. I have tried everything in data.table
but nothing seems to work.
I am following this example almost verbatim to merge by bike_id
and by nearest time
, but I end up with empty columns.
sum(is.na(df1$time))
# [1] 0
sum(is.na(df2$time))
# [1] 0
sapply(df1, class)
# $time
# [1] "POSIXct" "POSIXt"
# $bike_id
# [1] "integer"
sapply(df2, class)
# $time
# [1] "POSIXct" "POSIXt"
# $bike_id
# [1] "integer"
# $st_x
# [1] "numeric"
# $st_y
# [1] "numeric"
all(df1$bike_id %in% df2$bike_id)
# [1] TRUE
Neither of the tables have NA
in time value, and both contain the same bike_id
s.
For example:
head(df1)
# time bike_id
# 1: 2017-12-20 07:29:22 16465
# 2: 2017-12-20 15:34:36 16465
# 3: 2017-12-09 22:08:56 16468
# 4: 2017-12-10 12:38:53 16468
# 5: 2017-12-18 00:19:18 16468
# 6: 2017-12-18 17:29:18 16468
head(df2)
# time bike_id st_x st_y
# 1: 2017-12-04 16:21:31 16465 -90.07687 29.95264
# 2: 2017-12-04 16:30:05 16465 -90.07687 29.95264
# 3: 2017-12-04 16:40:05 16465 -90.07687 29.95264
# 4: 2017-12-04 16:50:05 16465 -90.07687 29.95264
# 5: 2017-12-04 17:00:06 16465 -90.07687 29.95264
# 6: 2017-12-04 17:10:06 16465 -90.07687 29.95264
The code I am using to match by bike_id
and by nearest time
is:
setkey(df2, time, bike_id)[, dateMatch:=time]
test <- df2[df1, roll = 'nearest']
head(test)
# time bike_id st_x st_y dateMatch
# 1: 2017-12-20 07:29:22 16465 NA NA <NA>
# 2: 2017-12-20 15:34:36 16465 NA NA <NA>
# 3: 2017-12-09 22:08:56 16468 NA NA <NA>
# 4: 2017-12-10 12:38:53 16468 NA NA <NA>
# 5: 2017-12-18 00:19:18 16468 NA NA <NA>
# 6: 2017-12-18 17:29:18 16468 NA NA <NA>
Any ideas as to what could possibly be going wrong? Substituting roll=Inf
produces the same result with columns of NA
s.
Wouldn't roll='nearest'
find at least some date if bike_id
s are the same in both tables and time
is never NA
?