5

I have the following data frames:

library(dplyr)

d1 <- data_frame(
title = c("base1", "base2", "base3", "base4"),
lat = c(57.3, 58.8, 47.2, 57.8),
long = c(0.4, 3.4, 3.5, 1.2))

d2 <- data_frame(
tas = c("tas1", "tas2", "tas3", "tas4"),
Base= c ("base1", "base2", "base3", "base4"),
lat=c(54.6, 56.4, 54.2, 54.6),
long = c(1.2, 3.4, 3.5, 56.6))

What I would like to do is calculate the distance in miles between tas in d2 and title in d1. So in d2 tas1 has the coordinates 54.6 lat and 1.2 long, and has 'base1' in the 'Base' column. So I would like to calculate the distance between 54.6lat by 1.2long and 57.3lat and 0.4lon.

I've tried to do this using the GeoDistanceInMetresMatrix function as detailed below but the function doesn't quite give me the structure that I want.

The below article gives some information on GeoDistanceInMetresMatrix

http://eurekastatistics.com/calculating-a-distance-matrix-for-geographic-points-using-r/

This is what I would like the data to look like:

 df <- data_frame(
tas = c("tas1", "tas2", "tas3", "tas4"),
Base= c ("base1", "base2", "base3", "base4"),
lat=c(54.6, 56.4, 54.2, 54.6),
long = c(1.2, 3.4, 3.5, 56.6),
difference_miles = c(23, 35, 56, 23))

I've been looking at this all afternoon and can't quite get it right so any help would be appreciated!

Mrmoleje
  • 453
  • 1
  • 12
  • 35

4 Answers4

6

This is easily accomplished using the geosphere library:

d1 <- data.frame(
  title = c("base1", "base2", "base3", "base4"),
  lat = c(57.3, 58.8, 47.2, 57.8),
  long = c(0.4, 3.4, 3.5, 1.2))

d2 <- data.frame(
  tas = c("tas1", "tas2", "tas3", "tas4"),
  Base= c ("base1", "base2", "base3", "base4"),
  lat=c(54.6, 56.4, 54.2, 54.6),
  long = c(1.2, 3.4, 3.5, 56.6))

library(geosphere)
#1609.35 is the conversion from miles to meters
dist<-distGeo(d1[, c("long", "lat")], d2[, c("long", "lat")])/1609.35
df<-cbind(d2, difference_miles=dist)
Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • Thanks so much, this is exactly what I'm looking for. Think i was over complicating with GeoDistanceInMetresMatrix – Mrmoleje Aug 13 '18 at 14:44
  • What if d2 looked like this `d2 <- data.frame(tas =c("tas1", "tas2", "tas3", "tas4"), Base= c ("base1", "base2", "base1", "base2"), lat=c(54.6, 56.4, 54.2, 54.6), long = c(1.2, 3.4, 3.5, 56.6))` – Mrmoleje Aug 13 '18 at 14:53
  • I don't fully understand your follow up question. The `distGeo` function calculates the distance between the 2 pairs of coordinates. As long a the vectors of the coordinates are the same length or an even multiple of each other it will still work. If there is not a 1 to 1 relationship between d1 and d2 then consider merging all of the data into 1 large dataframe to perform the calculation. See Dan's answer for a method to do that. – Dave2e Aug 13 '18 at 15:17
  • Okay I get it now. This only works for my specific example if I join the two data frames but I realise that now. Thanks for you help – Mrmoleje Aug 13 '18 at 15:45
5

One method might be to use the geosphere package:

# slightly modify your data because I want to merge it
df1 <- data.frame(
    title = c("base1", "base2", "base3", "base4"),
    lat1  = c(57.3, 58.8, 47.2, 57.8),
    long1 = c(0.4, 3.4, 3.5, 1.2), 
    stringsAsFactors = FALSE)

df2 <- data.frame(
    title = c ("base1", "base2", "base3", "base4"),
    lat2  = c(54.6, 56.4, 54.2, 54.6),
    long2 = c(1.2, 3.4, 3.5, 56.6), 
    stringsAsFactors = FALSE)

# merge your data so you're sure your lat/long pairs make sense
df <- merge(df1, df2, by="title")

# calculate distance according to the Haversine method (shortest dist around sphere)
df$dist_meters <- geosphere::distHaversine(
    p1=df[ , c("long1", "lat1")],
    p2=df[ , c("long2", "lat2")]  )

# convert meters to miles
df$dist_miles = df$dist_meters / 1609.34
DanY
  • 5,920
  • 1
  • 13
  • 33
  • Okay thanks. This makes sense. What if in df2 the title changes. So if it looked like `d2 <- data.frame(title =c("base1", "base2", "base1", "base2"), lat=c(54.6, 56.4, 54.2, 54.6), long = c(1.2, 3.4, 3.5, 56.6))`. – Mrmoleje Aug 13 '18 at 15:17
  • My point was just that it's "safer" to put both lat/lngs on one row of a dataframe (as opposed to having them in different dataframes) so that you are _sure_ that you're correctly matching up the "from" lat/lng to the "to" lat/lng. I changed your data so that a merge could be done before calculating distances, but you should adjust my code however you need so that it makes sense in your situation. – DanY Aug 13 '18 at 15:22
  • Okay I get it now and have managed to work it out with my data. Thank you! – Mrmoleje Aug 13 '18 at 15:44
2

You should also check out sp

library(sp)
p1 <- SpatialPoints(select(d1, long, lat))
p2 <- SpatialPoints(select(d2, long, lat))
spDists(p1, p2, longlat=TRUE, diagonal=TRUE)
# [1]  304.7427  267.2908  778.7028 3359.7988    (output is km)
CPak
  • 13,260
  • 3
  • 30
  • 48
  • This doesn't quite work. I get the error message `Warning message: In spDists(p1, p2, longlat = TRUE, diagonal = TRUE) : spDists: argument longlat conflicts with CRS(x); using the value TRUE` – Mrmoleje Aug 13 '18 at 15:01
  • An error and warning are different - the values are close to that provided with `geosphere` once you convert to miles – CPak Aug 13 '18 at 15:03
1

Since you're already using dplyr, you can easily add sf into your workflow. Here I make both your data frames into data frames with sf columns using the long/lat coordinates and a long/lat projection. Then I transform them each into a US foot-based projection and take the distances. You can add that distance vector to a joined version of your two initial data frames if you need to.

One note to be careful of is the order--I arranged d1_sf and d2_sf by the base labels, but if this doesn't work well in a larger or more complex dataset, or there are missing bases, you could use a join here to check.

library(tidyverse)
library(sf)

...

d1_sf <- st_as_sf(d1, coords = c("long", "lat"), crs = 4326) %>%
  arrange(title)
d2_sf <- st_as_sf(d2, coords = c("long", "lat"), crs = 4326) %>%
  arrange(Base)

distances <- st_distance(
  st_transform(d1_sf, crs = 2234),
  st_transform(d2_sf, crs = 2234),
  by_element = T
)

distances
#> Units: US_survey_foot
#> [1]  1035387.8   916425.4  2591457.0 11553291.3

inner_join(d1, d2, by = c("title" = "Base"), suffix = c("1", "2")) %>%
  mutate(dist = distances) %>%
  mutate(dist_mi = dist / 5280)
#> # A tibble: 4 x 8
#>   title  lat1 long1 tas    lat2 long2 dist               dist_mi          
#>   <chr> <dbl> <dbl> <chr> <dbl> <dbl> <S3: units>        <S3: units>      
#> 1 base1  57.3   0.4 tas1   54.6   1.2 " 1035387.8 US_su… " 196.0962 US_su…
#> 2 base2  58.8   3.4 tas2   56.4   3.4 "  916425.4 US_su… " 173.5654 US_su…
#> 3 base3  47.2   3.5 tas3   54.2   3.5 " 2591457.0 US_su… " 490.8062 US_su…
#> 4 base4  57.8   1.2 tas4   54.6  56.6 11553291.3 US_sur… 2188.1234 US_sur…

Created on 2018-08-13 by the reprex package (v0.2.0).

camille
  • 16,432
  • 18
  • 38
  • 60