0

I have a massive data.frame with the starting and ending (latitudes & longtitude) and am using the georoute function from the taRifx.geo package, to find out how far and how much time does it take to drive from A to B.

the data looked something like this (both latlon and latlon_end are class of characters:

> LL[1:10,14:15]
         latlon            latlon_end
1  52.481466 13.317647   52.518811 13.413034
2  52.518811 13.413034   52.504182 13.318051
3  52.504182 13.318051   52.502236 13.305396
4  52.502236 13.305396   52.548096 13.355104
5  52.548096 13.355104   52.569865 13.410967
6  52.569865 13.410967   52.54505 13.419071
7  52.54505 13.419071    52.527736 13.378182
8  52.527736 13.378182   52.495678 13.343019
9  52.495678 13.343019   52.496712 13.341767
10 52.496712 13.341767   52.458631 13.32529

and here is a for loop that I have written for the purpose:

for(i in 38753:100000){
  DT[i,]=tryCatch(t(as.matrix(unlist(georoute( c(as.character(LL$latlon[i]),
                                                  as.character(LL$latlon_end[i])),
                                                verbose=TRUE, returntype=c("time", "distance"))),
                               nrow = 1, ncol = 2)),
                   error=function(a) {"."} )

}

the base function here, georoute basically give out a list of two elements, time and distance, that's why I have to unlist them first before binding all them into a dataframe. for the trycatch function, that's to deal with occasional error for the georoute, I have no idea how alternatively I can do this..

I have really tried a lot of methods but only this seems to have to work out for me, since somehow this georoute function seems to take only one pair of latlon & latlon_end at one time so I have to do this row by row. However with a few hundred thousands of entries this is taking me days or even weeks to process all this data. I know I should go in the package and understand the codes behind(link inserted) just so I know what is a better fit for this purpose, yet the script is too advanced for my level and I don't even know what in the script that I am looking for to be exact. I guess I could use the lapply function for this but I just can't make it work.

Any help or tips or ideas would be very really super greatly appreciated!

ps. update for original georoute returns

> georoute(c(as.character(LL$latlon[1]), as.character(LL$latlon_end[1])), verbose = FALSE, returntype = c("time","distance"))
  distance time
1     9.03 1338
> georoute(c(as.character(LL$latlon[1:3]), as.character(LL$latlon_end[1:3])), verbose = FALSE, returntype = c("time","distance"))
  distance time
1   35.599 5275
> class(georoute(c(as.character(LL$latlon[1]), as.character(LL$latlon_end[1])), verbose = FALSE, returntype = c("time","distance")))
[1] "data.frame"

and I think the distance and time returned are numeric because the summary of that shows the 4 quantiles, mean, medians etc.

Samantha
  • 41
  • 5
  • 1
    I can't imagine the for loop being the bottleneck here. I'm not familiar with georoute but realistically it must take a long time to find a route between two points. A parallel version may be the only solution, if it is allowed. – Jesse Anderson Jan 14 '17 at 18:26

1 Answers1

0

Consider bypassing the package and use its data source, namely Bing's Calculate a Route API which interfaces to http://dev.virtualearth.net for json feeds per parameters. On closer read, the GitHub source code looks heavy with vector and matrix manipulation that proves heavy in processing. Simply a json feed needs to be parsed for distance and time data points.

Below uses the jsonlite library to send same parameters as package to build urls iteratively with each pair of Lat/Lon for waypoints. Once json feeds are imported, the needed dataframes are extracted into list. Do note: a Bing Maps API key is required which should have been per package requirements.

library(jsonlite)

BingMapsAPIkey <- "*****"

dfList <- lapply(seq(38753:100000), function(i) {

  url <- paste0("http://dev.virtualearth.net/REST/v1/Routes?wayPoint.1=", 
                gsub(" ", ",", LL$latlon[i]) , "&wayPoint.2=", gsub(" ", ",", LL$latlon_end[i]),
                "&maxSolutions=1&optimize=time&routePathOutput=Points&distanceUnit=km&travelMode=Driving",
                "&key=", BingMapsAPIkey)      
  tryCatch({
    jsondata <- fromJSON(url)
    return(jsondata$resourceSets$resources[[1]]$routeLegs[[1]]$routeSubLegs[[1]][c("travelDistance", "travelDuration")])
  }, error=function(e) return(data.frame(travelDistance=NA, travelDuration=NA)))

})

# ROW BIND DATAFRAME ELEMENTS IN LIST
geodf <- do.call(rbind, dfList)

# COLUMN BIND TO ORIGINAL DATAFRAME
df <- cbind(LL[38753:100000,], geodf)

Output (using above posted Lat/Lon data)

#                 latlon          latlon_end travelDistance travelDuration
# 1  52.481466 13.317647 52.518811 13.413034          9.030           1338
# 2  52.518811 13.413034 52.504182 13.318051          8.148           1269
# 3  52.504182 13.318051 52.502236 13.305396          1.694            254
# 4  52.502236 13.305396 52.548096 13.355104         11.700            820
# 5  52.548096 13.355104 52.569865 13.410967          5.966            919
# 6  52.569865 13.410967  52.54505 13.419071          3.110            576
# 7   52.54505 13.419071 52.527736 13.378182          3.851            728
# 8  52.527736 13.378182 52.495678 13.343019          6.196           1051
# 9  52.495678 13.343019 52.496712 13.341767          0.986            277
# 10 52.496712 13.341767  52.458631 13.32529          6.129            947
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • I just tried the codes but apparently the result is being pasted all into the same row. so I got the following message trying to process 5 rows: `Warning message: In `[<-.data.frame`(`*tmp*`, c("time", "distance"), value = list( : provided 10 variables to replace 2 variables`. Do you know any ways that can make R paste the result of each pair row by row? – Samantha Jan 15 '17 at 12:59
  • that warning I got was for the `sapply` codes. When I try with `vapply` I got the same result pasted in all of the other rows.. any ideas :/ and thanks nethertheless! was so nice of you even went off to see the source code :( – Samantha Jan 15 '17 at 13:09
  • Are you sure `georoute()` returns a list of two elements? Can you post the return value of a few of this function? From your `for` loop before your matrix conversion, print to screen the output of this method and edit in question. – Parfait Jan 15 '17 at 16:22
  • See update that now uses only `lapply()` to retrieve a list of dataframes from the iterative `georoute()` calls. From there row bind dfs together then cbind into `LL`. – Parfait Jan 15 '17 at 17:48
  • I experimented just the `lapply` part but seems that it's actually slower than the for loop, which is pretty weird – Samantha Jan 16 '17 at 15:31
  • See update where we reverse engineer the Git package and go directly to data source to parse json feeds. You will need a Bing Maps API key. – Parfait Jan 16 '17 at 18:20
  • I got all NAs for all columns and rows on my try.. just checked my API key still works tho – Samantha Jan 16 '17 at 19:52
  • Are you sure data is structured exactly as you post at the position? Check: `LL[38753:100000, c("latlon", "latlon_end")]`. Also, I am [reading](http://stackoverflow.com/questions/10213431/usage-limit-on-bing-geocoding-vs-google-geocoding): *Bing limits you 30,000 transactions per day in the free developer account*. Check console -is there an *jsondata* object? Is *dfList* empty? – Parfait Jan 16 '17 at 21:22
  • Yep the data is correctly positioned (though 38753:10000 was just the one part of the data and I am doing a bit further now, but I changed that in the `seq()` accordingly so that should be fine). _dfList_ returns a nice list just with all NAs as both distance and time. _jsondata_ was not found in the global env tho! – Samantha Jan 16 '17 at 21:33
  • Error is being trapped by the `tryCatch()`. Outside of the loop, try building one url, replacing `i` with a number, like 1. Then call: `jsondata <- fromJSON(url)`. Even grab that URL and put in browser bar. See what message you receive. – Parfait Jan 16 '17 at 21:38
  • I replaced all `i` in the loop and called but `jsondata` is still not there, so I just got error trying to get the url from JSON. What do you mean exactly by "try building one url outside the loop"? – Samantha Jan 16 '17 at 22:18
  • Check if this URL works which uses first row's lat/lon pairs (be sure to API at end): `http://dev.virtualearth.net/REST/v1/Routes?wayPoint.1=52.481466,13.317647&wayPoint.2=52.518811,13.413034&maxSolutions=1&optimize=time&routePathOutput=Points&distanceUnit=km&travelMode=Driving&key=BING_MAP_API_KEY` – Parfait Jan 16 '17 at 22:40
  • yeh i got the message that my key wasn't accepted and checked that the api doesn't work anymore (it was still somehow working when i tried the `jsondata` thing). Then it's another problem. Don't really know how I can express how thankful I am though, you are truly a magician to me. – Samantha Jan 16 '17 at 22:50