Row by row distance calculation

Question

Suppose we have a data frame of a delivery agency where rows represent deliveries as follows:

Agent   Time of Delivery    Address
Alpha   12-30-2012 08:12    Location 1
Alpha   12-30-2012 08:18    Location 2
Alpha   12-30-2012 10:12    Location 3
Alpha   12-30-2012 10:25    Location 4
Beta    12-30-2012 08:30    Location 2
Beta    12-30-2012 09:44    Location 5
Beta    12-30-2012 18:11    Location 1
Gamma   12-30-2012 07:05    Location 6
Gamma   12-30-2012 08:30    Location 4
Gamma   12-30-2012 08:33    Location 3
Gamma   12-30-2012 14:12    Location 1
Gamma   12-30-2012 22:05    Location 2

Given the dataset above, I'd like to calcute the length of daily routes by each delivery agent in km (assuming that there is a function that can calculate the distance of two addresses - is there?). My problem is twofold:

I must follow the time sequence in order to follow the real physical route of the agent, and "substract" the former location "from" the latter, but how could one compare the actual row with the "previous" row in R?
Knowing that the agents depart every moring from and arrive every evening to the company HQ, I must add the HQ-first_address_of_the_day_of_each_agent and last_address_of_the_day_of_each_agent-HQ to the daily calcuation by agents, which also implies that I can figure out which are the "neighbouring" rows (again a supposed comparison with the previous/next timestamp).

But how?

Your sample data doesn't have any distance information so it's difficult how to start calculating total distance. See the suggestions for making a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It would also be helpful if you shared some code that showed some effort on your part. The `split()` and `embed()` function can help separate data by agent and make sequential pairs of observations. — MrFlick, Jan 26 '15 at 19:40

GregF · Answer 1 · 2015-01-26T21:36:42.080

There are tons of ways to do this, as there often are when working in R. I'd solve this by using these two packages:

dplyr (functions group_by() and lead() to answer your first question)
ggmap (function mapdist() to find the distance between the locations using Google maps)

Note that depending on how big your dataset is, this solution may not work, because Google maps has limits on the number of times you can request information from them. See here for more information.

To get you started, here's a quick example of how this solution might work, though it doesn't add in the start and end locations, and isn't particularly careful about making sure you don't go over the API limit.

For your second question, it kind of depends on how your dataset is structured. Are there multiple days in a single dataset? You could create a dummy dataset with each person's name and every available day to add on to the main dataset with rbind() and then arrange() the dataset to the correct order.

library(dplyr)
library(ggmap)

distance_helper <- function(x, y) {
    Sys.sleep(0.1) # To avoid running out of requests

    out <- mapdist(x, y)
    return(out$km)
}

data <- data.frame(agent = c("a", "a", "a", "b", "b", "b"), 
                   address = c("Atlanta", "Detroit", "Chicago", "San Francisco", "Des Moines", "Austin"), stringsAsFactors=F)

out <- data %>% 
            group_by(agent) %>%
            mutate(distance = distance_helper(address, lead(address)))

out

Row by row distance calculation

1 Answers1