0

I have a table with the lat lon coordinates of mileposts on highway routes in my state. I want to use this table as a lookup table. I have a second table with milepost and highway-route columns. I want to assign lat lon columns to the second table based on the nearest milepost on a highway route.

In both tables the highway-route column is named routeID and the milepost column is named milePost.

(I generally work with node.js, so please forgive my limited knowledge of R nomenclature and syntax.)

I've tried to do this in many ways, but I can't seem to get my desired result. For example, below is a function that I built to iterate over every milepost value. I tried to narrow down the values I loop over by filtering the table by the routeID.

The following code is very much lacking. I'm sharing it so you get a sense of how I'm approaching the problem, but I'd like to get advice on how I can change my approach. What's the best way to do this? I have been using the dplyr package.

Tables in console:

> lookup
     routeID1 milePost1   lon      lat      tz              routeID milePost
      <chr>    <fctr>    <dbl>    <dbl>    <chr>             <dbl>   <dbl>
1:    I 019    102.08 -110.9789 32.19912 America/Phoenix      19   102.08
2:    I 019    000.00 -110.9412 31.33468 America/Phoenix      19     0.00
3:    I 019    069.69 -110.9835 31.90669 America/Phoenix      19    69.69
4:    I 019    042.43 -111.0628 31.67479 America/Phoenix      19    42.43
  ---                                                                       
8684: S 064    280    -111.6610 35.93829 America/Phoenix      64   280.00
8685: S 064    200    -112.1715 35.46088 America/Phoenix      64   200.00

> dataTable 
         date        time       routeID milePost     county_name
        <date>     <S3: hms>     <dbl>   <dbl>        <chr>
1:      2015-04-01 07:25:00       93       33         Mohave County
  ---   
450040: 2015-09-30 12:55:00       353      80         Cochise County               
450041: 2015-09-30 21:10:00       NA       NA         Maricopa County               
450042: 2015-09-30 22:55:00       17      204         Maricopa County                  

Function:

loop <- function(x,y){
  i<-0
  for(mile in x){

    i<-i+1

# filter to only rows that match dataTable$routeID 
  filt_Lookup <- lookup%>%filter(routeID == y[i])

# find index of closest matching milepost in filt_Lookup 
  index <-  which.min(abs(filt_Lookup$milePost - mile))

#use that index to find the corresponding lat coordinate 
  w<-filt_Lookup[index]$lat

  w
  } 
}

q <- dataTable%>%mutate(lat=loop(milePost, routeID))

Currently, my function does not add a new column of latitude values that corresponds with each row's dataTable$routeID and nearest dataTable$milePost values.

My expected results would do so by adding a lat column, like so:

> q 
         date        time       routeID milePost     county_name        lat
        <date>     <S3: hms>     <dbl>   <dbl>        <chr>            <dbl>
1:      2015-04-01 07:25:00       93       33         Mohave County   31.67479
  ---   
450040: 2015-09-30 12:55:00       353      80         Cochise County  35.46088             
450041: 2015-09-30 21:10:00       NA       NA         Maricopa County    NA             
450042: 2015-09-30 22:55:00       17      204         Maricopa County 35.46088

As a note: The above lat values are just placeholders and don't necessarily correspond. I need to add the corresponding lon values too.

Duplicate question response: It's unique because I need to merge rows with an exact routeID match and the closest possible milePost match.

Wimpel
  • 26,031
  • 1
  • 20
  • 37
  • Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – Shree Jun 08 '19 at 02:23
  • Also see - [Join R data.tables where key values are not exactly equal--combine rows with closest times](https://stackoverflow.com/questions/15712826/join-r-data-tables-where-key-values-are-not-exactly-equal-combine-rows-with-clo) – Shree Jun 08 '19 at 02:28
  • @Shree How can I use ```roll='nearest'``` in addition to finding the exact match of another comparison? How may I match ```routeID``` and then use ```roll='nearest'``` to find the closest milepost for each row? – Dinosaur Reporter Jun 08 '19 at 02:52
  • See if this helps - [Find nearest values of 2 columns from a larger 'lookup' style data.table](https://stackoverflow.com/questions/56259822/find-nearest-values-of-2-columns-from-a-larger-lookup-style-data-table/56260459#56260459). This question certainly has answers on this site. Searching for fuzzy join might help as well. – Shree Jun 08 '19 at 03:43
  • @Wimpel I believe the reason I can't adapt this solution is because I only have lat lon coordinates in one of the two tables. I need to merge the coordinates from the first table into the second table by using the mileposts in both tables as a reference. – Dinosaur Reporter Jun 08 '19 at 08:38
  • These more easily readable? Help me merge these tables by routeID and closest milePost? @Wimpel ```lookup <- data.table(routeID = c( 19,19,64,64), milePost = c( 102,0,69.69,42.43), lat = c( 32.19912,31.33468,35.93829,35.46088), lon = c( -110.9789,-110.9412,-111.6610,-112.1715) ) dataTable <- data.table( date = c(2015-04-01, 2015-09-30,2015-09-30,2015-09-30,2015-10-03,2015-11-03), time = c(05:25:00,07:25:00,07:25:45,04:4:00,02:29:00,07:25:44), routeID = c( 19,19,64,64,NA,64), milePost = c( 100,1,70,40,NA,70) )``` – Dinosaur Reporter Jun 08 '19 at 08:44

1 Answers1

0

sample data

taken from comment, slightly altered

library(data.table)
lookup <- data.table(routeID = c( 19,19,64,64), milePost = c( 102,0,69.69,42.43), lat = c( 32.19912,31.33468,35.93829,35.46088), lon = c( -110.9789,-110.9412,-111.6610,-112.1715) ) 
dataTable <- data.table( date = c("2015-04-01", "2015-09-30", "2015-09-30","2015-09-30", "2015-10-03", "2015-11-03"), 
                         time = c("05:25:00","07:25:00","07:25:45","04:4:00","02:29:00","07:25:44"), 
                         routeID = c( 19,19,64,64,NA,64), 
                         milePost = c( 100,1,70,40,NA,70) )

code

a rolling join is performed on the last keys of the joined data.tables.. so set RouteId as the first key, and milePost as the second

setkey( lookup, routeID, milePost )
setkey( dataTable, routeID, milePost )

then perform a rollin gupdate join, where you join the columns lat, lon and Milepost from the lookup-table into the dataTable table.

#rolling update join on dataTable
dataTable[ lookup, 
           `:=`( lat = i.lat, 
                 lon = i.lon, 
                 milePost.lookup = i.milePost), 
           roll = "nearest" ][]

output

#          date     time routeID milePost      lat       lon milePost.lookup
# 1: 2015-10-03 02:29:00      NA       NA       NA        NA              NA
# 2: 2015-09-30 07:25:00      19        1 31.33468 -110.9412            0.00
# 3: 2015-04-01 05:25:00      19      100 32.19912 -110.9789          102.00
# 4: 2015-09-30  04:4:00      64       40 35.46088 -112.1715           42.43
# 5: 2015-09-30 07:25:45      64       70 35.93829 -111.6610           69.69
# 6: 2015-11-03 07:25:44      64       70       NA        NA              NA
Wimpel
  • 26,031
  • 1
  • 20
  • 37