5

I am trying to use R with the tidyverse packages and having trouble applying a function to my data. My data includes lat/long coordinates, and I want to calculate the distance from every location (row of my data frame) to a reference location. I am trying to use the geosphere::distm function.

library(tidyverse)
library(geosphere)

my_long <- 172
my_lat <- -43

data <- data %>%  rowwise() %>% mutate(
  dist = distm(c(myLong, myLat), c(long, lat), fun=distHaversine) # this works
)

I got it working using the rowwise() function, as above, but this is deprecated, so I want to know how to do it with modern tidyverse, i.e., dplyr or purrr, I think, for example the closest I have got is using map2:

my_distm <- function(long1, lat1, long2, lat2)
  distm(c(long1, lat1), c(long2, lat2), fun=distHaversine)

data <- data %>%  mutate(
  dist = map2(long, lat, my_distm, my_long, my_lat) # this doesn't
)

So far I have failed.

www
  • 38,575
  • 12
  • 48
  • 84
Simon Woodward
  • 1,946
  • 1
  • 16
  • 24
  • 1
    Is my problem because distm isn't a vectorised function? If it was, I could use it directly in mutate()? – Simon Woodward Aug 22 '17 at 01:55
  • Yes, that's why. Simply do `Vectorize(my_distm)` and it should work directly in your `mutate()` call. – Steven Beaupré Aug 23 '17 at 01:47
  • Hey @SimonWoodward looks like you've received some great answers below. Please consider accepting one (check mark to the left) to let the community know that that answer worked for you. – CPak Sep 09 '17 at 19:16
  • Actually I stuck with rowwise(). The other solutions were more complex than I wanted. Should I still check an answer below? – Simon Woodward Sep 10 '17 at 20:19

4 Answers4

8

You could use distHaversine instead of distm, and cbind:

data %>%  mutate(dist = distHaversine(cbind(myLong, myLat), cbind(long, lat)))

Example data:

myLong = 172
myLat = -43 
long = c(180,179,179)
lat = c(-40,-41,-40)
data = data.frame(myLong,myLat,long,lat)

Which gives as result:

  myLong myLat long lat     dist
1    172   -43  180 -40 745481.0
2    172   -43  179 -41 620164.8
3    172   -43  179 -40 672076.2
Lamia
  • 3,845
  • 1
  • 12
  • 19
  • 1
    If provided with two sets of n coordinates, `distm` returns a matrix of dimension `nxn` of the distances between each combination of postions taken from the 2 sets, whereas `distHaversine` returns a vector of length `n` with the distances between the 2 first positions, the 2 second.. – Lamia Aug 22 '17 at 19:21
  • 1
    why do you need cbind() ? – Simon Woodward Aug 22 '17 at 21:25
  • 1
    You need `cbind` because `distHaversine` takes as an input either a vector of 2 numbers or a matrix with 2 columns. If you use `rowwise()`, then mutate is applied row by row and then you can use `c()`, but if not, you need to use `cbind` to combine the 2 vectors of long and lat. I assumed the two sets of positions (myLong, myLat, long, lat) were in the dataframe. If myLong and myLat are single values instead of columns of the dataframe, you can use `c()` for them instead of `cbind()`. – Lamia Aug 23 '17 at 00:11
  • 1
    **Attention**: `distHaversine()` gives you the Great Circle distance, which, depending on the use case, can give you inaccurate results. – Roman May 18 '19 at 02:39
7

You can use mutate with mapply:

library(tidyverse)
library(geosphere)

my_long <- 172
my_lat <- -43

df <- data.frame(long = c(170, 180), lat = c(-43, 43))
df %>% rowwise() %>% mutate(
  dist = distm(c(my_long, my_lat), c(long, lat), fun=distHaversine) # this works
)

#Source: local data frame [2 x 3]
#Groups: <by row>

# A tibble: 2 x 3
#   long   lat    dist
#  <dbl> <dbl>   <dbl>
#1   170   -43  162824
#2   180    43 9606752

df %>% mutate(
    dist = mapply(function(lg, lt) distm(c(my_long, my_lat), c(lg, lt), fun=distHaversine), long, lat)
)

#  long lat    dist
#1  170 -43  162824
#2  180  43 9606752

Update on using map2:

df %>% 
    mutate(dist = map2(long, lat, ~distm(c(my_long, my_lat), c(.x, .y), fun=distHaversine)))
# here .x stands for a value from long column, and .y stands for a value from lat column
#  long lat    dist
#1  170 -43  162824
#2  180  43 9606752

To use my_distm:

my_distm <- function(long1, lat1, long2, lat2)
    distm(c(long1, lat1), c(long2, lat2), fun=distHaversine)

df %>% mutate(dist = map2(long, lat, ~my_distm(my_long, my_lat, .x, .y)))
#  long lat    dist
#1  170 -43  162824
#2  180  43 9606752
Psidom
  • 209,562
  • 33
  • 339
  • 356
4

You could use pmap()

f  <- function(StartLong, StartLat, EndLong, EndLat) 
  distm(c(StartLong, StartLat), c(EndLong, EndLat))

df %>% mutate(dist = pmap_dbl(., f))

Or Vectorize() your function and use directly in mutate():

g <- Vectorize(f)
df %>% mutate(dist = g(StartLong, StartLat, EndLong, EndLat))

Which gives:

#  StartLong StartLat EndLong EndLat    dist
#1       170      -43     172    -43  162824
#2       180       43     172    -43 9606752

Another idea with by_row() from purrrlyr

library(purrrlyr)

df %>%
  by_row(function(x) {
    distm(c(x$StartLong, x$StartLat), 
          c(x$EndLong, x$EndLat)) },
    .collate = "rows", .to = "dist") 

Which gives:

## tibble [2 x 5]
#  StartLong StartLat EndLong EndLat    dist
#      <dbl>    <dbl>   <dbl>  <dbl>   <dbl>
#1       170      -43     172    -43  162824
#2       180       43     172    -43 9606752

Data

df <- structure(list(StartLong = c(170, 180), StartLat = c(-43, 43), 
      EndLong = c(172, 172), EndLat = c(-43, -43)), .Names = c("StartLong", 
      "StartLat", "EndLong", "EndLat"), row.names = c(NA, -2L), class = "data.frame")
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
  • Didn't know about `purrrlyr`...Is it available on CRAN? – CPak Aug 22 '17 at 01:36
  • @ChiPak Yes, it is. Also, as per mentioned on [GitHub](https://github.com/hadley/purrrlyr): *`purrrlyr` contains some functions that lie at the intersection of `purrr` and `dplyr`. They have been removed from `purrr` in order to make the package lighter and because they have been replaced by other solutions in the `tidyverse`* – Steven Beaupré Aug 22 '17 at 01:44
  • 1
    I see now you provided your data at the end. I missed that earlier – CPak Aug 22 '17 at 22:25
2

I'm very fond of rowwise as well, but since you're looking for other solutions

Data of Psidom

my_long <- 172
my_lat <- -43
myval <- c(my_long, my_lat)

df <- data.frame(long = c(170, 180), lat = c(-43, 43))

purrr solution

Here's purrr::map

library(purrr)
df1 <- df %>%  
         mutate(dist = map(1:nrow(.), ~distm(myval, df[.x,], fun=distHaversine)))

#   long lat    dist
# 1  170 -43  162824
# 2  180  43 9606752

You could use map2 by repeating myval multiple times into the shape of a 2 column data.frame but not as a vector

request by OP

To select long and lat from a larger data frame to use with distm, use select in the map statement

garbage <- data.frame(long = c(170, 180), lat = c(-43, 43), junk=c(0,0))
df1 <- garbage %>%  
         mutate(dist = map(1:nrow(.), ~distm(myval, select(garbage[.x,],long,lat), fun=distHaversine)))

#   long lat junk    dist
# 1  170 -43    0  162824
# 2  180  43    0 9606752

sapply solution with iterators

I also like to use iterators for rowwise operations

library(iterators)
df2 <- df %>%
         mutate(dist = sapply(iter(df, by="row"), function(x) distm(myval, x, fun=distHaversine)))

#   long lat    dist
# 1  170 -43  162824
# 2  180  43 9606752
CPak
  • 13,260
  • 3
  • 30
  • 48
  • Thanks, I was trying to make it work with purrr. How do I pick the df columns by name? – Simon Woodward Aug 22 '17 at 01:17
  • Sorry, what do you mean `pick df columns by name`? As far as I understand, `distm` only works on a vector or 2-column matrix... – CPak Aug 22 '17 at 01:21
  • Yes and my long and lat data are columns in a larger data frame (e.g., c(data[.x,long], data[.x,lat]) but this doesn't work). – Simon Woodward Aug 22 '17 at 01:23