2

I've already looked through several answers but have not been able to apply it to my problems. See:

Calculating the distance between points in different data frames

Calculating number of points within a certain radius

find locations within certain lat/lon distance in r

find number of points within a radius in R using lon and lat coordinates

Identify points within specified distance in R

I have df loc and stop. For each stop I want to find the distance to loc.

My locations

loc <- data.frame(station = c('Baker Street','Bank'),
                  lat = c(51.522236,51.5134047),
                  lng = c(-0.157080, -0.08905843),
                  postcode = c('NW1','EC3V')
                  )

My stops

stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
                   lat = c(51.53253,51.520865,51.490281,51.51224),
                   lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
                   postcode = c('EC1V','EC1A', 'W14', 'W2'))

As a final result I would like something like this:

df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker Street','Bank'), 
                 stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'), 
                 dist = c('x','x','x','x','x','x','x','x'), 
                 lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224), 
                 lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
                 postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
                 )

My dataset is relatively big so I'm looking for an efficient method to solve this problem.

Any ideas on how to achieve this?

Community
  • 1
  • 1
Davis
  • 466
  • 4
  • 20
  • I may not be reading the question correctly but are you trying to find the distance between each point in the stop dataframe from each point in the loc dataframe? – Awhstin Dec 01 '16 at 21:18
  • @Awhstin Yes exactly...each distance from `stop` to `loc` – Davis Dec 01 '16 at 21:21
  • 1
    Coincidentally, I [answered a question yesterday](http://stackoverflow.com/a/40898595/496488) that has a base R approach that would work here if you substitute `loc` for `circles` and `stop` for `dat` and also make sure you carry through whichever columns you want to keep from each data frame. (The questions aren't duplicates, but the answers are similar.) – eipi10 Dec 01 '16 at 21:56
  • @eipi10 thanks a lot for pointing your post out, very useful. What unit are the radius and distance in? Let's say I wanted a 5km from the center of the circle, what value would I set as a radius and how can I interpret the distance. Sorry if that's an obvious question. – Davis Dec 02 '16 at 00:01
  • The units are meters, since that what `distHaversine` returns. – eipi10 Dec 02 '16 at 05:20

2 Answers2

4

This makes use of expand.grid and merge some creative variable renaming. It's a little man-handly but it's pretty efficient since the operations are vectorized.

library(dplyr)
df <- expand.grid(station = loc$station, stop = stop$station) %>%
  merge(loc, by = 'station') %>%
  rename(loc = station, lat1 = lat, lng1 = lng, station = stop) %>%
  select(-postcode) %>%
  merge(stop, by = 'station') %>%
  rename(stop = station, lat2 = lat, lng2 = lng)
#           stop          loc     lat1        lng1     lat2      lng2 postcode
# 1        Angel Baker Street 51.52224 -0.15708000 51.53253 -0.105790     EC1V
# 2        Angel         Bank 51.51340 -0.08905843 51.53253 -0.105790     EC1V
# 3     Barbican Baker Street 51.52224 -0.15708000 51.52087 -0.097758     EC1A
# 4     Barbican         Bank 51.51340 -0.08905843 51.52087 -0.097758     EC1A
# 5 Barons Court Baker Street 51.52224 -0.15708000 51.49028 -0.214340      W14
# 6 Barons Court         Bank 51.51340 -0.08905843 51.49028 -0.214340      W14
# 7    Bayswater Baker Street 51.52224 -0.15708000 51.51224 -0.187569       W2
# 8    Bayswater         Bank 51.51340 -0.08905843 51.51224 -0.187569       W2

We can then use geosphere::distHaversine() (inspired by Jacob) to calculate the distances using the Haversine formula.

df$dist_meters <- geosphere::distHaversine(select(df, lng1, lat1),
                                           select(df, lng2, lat2))
df %>%
  select(stop, loc, dist_meters)
#           stop          loc dist_meters
# 1        Angel Baker Street    3732.422
# 2        Angel         Bank    2423.989
# 3     Barbican Baker Street    4111.786
# 4     Barbican         Bank    1026.091
# 5 Barons Court Baker Street    5328.649
# 6 Barons Court         Bank    9054.998
# 7    Bayswater Baker Street    2387.231
# 8    Bayswater         Bank    6825.897

And in case your curious how the Haversine formula works,

latrad1 <- df$lat1 * pi/180
latrad2 <- df$lat2 * pi/180
dlat <- df$dlat * pi/180
dlng <- df$dlng * pi/180
a <- sin(dlat / 2)^2 + sin(dlng / 2)^2 * cos(latrad1) * cos(latrad2)
dist_rad <- 2 * atan2(sqrt(a), sqrt(1-a))
df %>%
  mutate(dist_meters_byhand = dist_rad * 6378137) %>%
  select(stop, loc, dist_meters_geosphere = dist_meters, dist_meters_byhand)
#           stop          loc dist_meters_geosphere dist_meters_byhand
# 1        Angel Baker Street              3732.422           3732.422
# 2        Angel         Bank              2423.989           2423.989
# 3     Barbican Baker Street              4111.786           4111.786
# 4     Barbican         Bank              1026.091           1026.091
# 5 Barons Court Baker Street              5328.649           5328.649
# 6 Barons Court         Bank              9054.998           9054.998
# 7    Bayswater Baker Street              2387.231           2387.231
# 8    Bayswater         Bank              6825.897           6825.897
Ben Fasoli
  • 526
  • 3
  • 7
  • thanks for your answer, very helpful. What counts as close together? Would this work with data points within the same country (i.e. UK) or do I need spherical coordinates for distances this large? Additionally what unit are the distanced in your answer measured in? – Davis Dec 01 '16 at 23:02
  • 1
    I've changed the results to meters using the geosphere package as Jacob suggested. – Ben Fasoli Dec 01 '16 at 23:50
0

Not as clever (or probably as fast) as @Ben's but here's another way:

library(geosphere)

master_df <- data.frame()

for (i in 1:nrow(loc)){
  this_loc <- loc[i, 1]
  temp_df <- cbind(stop, 
                   data.frame(loc = this_loc, 
                   dist = distm(as.matrix(stop[, 2:3]), c(loc[i, 2], loc[i, 3]))))
  master_df <- rbind(master_df, temp_df)
}

The geosphere package uses haversine by default which might be useful if accuracy is required.

Jacob
  • 3,437
  • 3
  • 18
  • 31
  • thanks for your help. I noticed that if I try your approach I don't get unique distances i.e. dist. `Angel` to `Baker Street` is the same as dist. `Angel` to `Bank`? – Davis Dec 01 '16 at 23:34