2

I have a dataset and I am trying to get a group locations to its nearest metropolitan. I have dataset 1 (df1) which contains the address locations with longitude and latitude. I want to map these address to all the nearest metropolitans (in a data frame df2) that are within a 50 mile radius.

g_lat <- c(45.52306, 40.26719, 34.05223, 37.38605, 37.77493)
g_lon <- c(-122.67648,-86.13490, -118.24368, -122.08385, -122.41942)
address <- c(1,2,3,4,5)
df1 <- data.frame(g_lat, g_lon, address)

g_lat <- c(+37.7737185, +45.5222208,+37.77493)
g_lon <- c(-122.2744317,-098.7041549,-122.41942)
msa <- c(1,2,3)
df2 <- data.frame(g_lat, g_lon, msa)

I want output as follows showing all the msa that this address is associated with:

 address g_lat    g_lon         msa 
  5     37.77493  -122.41942     1
  5     37.77493  -122.41942     3

Please kindly let me know how this can be achieved. I have tried the following:

 library(geosphere)
 # create distance matrix
 mat <- distm(df1[,c('g_lon','g_lat')], df2[,c('g_lon','g_lat')], fun=distVincentyEllipsoid)


  error: 
  Error in .pointsToMatrix(y) : longitude < -360

   # assign the name to the point in list1 based on shortest distance in the matrix
   df1$locality <- df2$locality[max.col(-mat)]
Jaap
  • 81,064
  • 34
  • 182
  • 193
user3570187
  • 1,743
  • 3
  • 17
  • 34
  • 1
    Related / possible duplicate: [*Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)*](https://stackoverflow.com/q/31668163/2204410) – Jaap Sep 20 '18 at 13:41
  • I checked your solution. I am getting an error "Error in .pointsToMatrix(y) : longitude < -360", I think the data points in the two data frames are causing some difficulty if I use your first solution. Please advise. thanks so much – user3570187 Sep 20 '18 at 13:51
  • Could you include the code you've tried in your question? – Jaap Sep 20 '18 at 13:54
  • @Jaap I have updated the code – user3570187 Sep 20 '18 at 14:02
  • I don't get that error message, but this problem should be solved a bit differently than the solution I linked above (if I understand your problem correctly). See my answer below. – Jaap Sep 20 '18 at 14:11

1 Answers1

1

A possible solution:

library(geosphere)

mat <- distm(df1[,c('g_lon','g_lat')], df2[,c('g_lon','g_lat')], fun=distVincentyEllipsoid)

ri <- row(mat)[mat < 80000]
ci <- col(mat)[mat < 80000]

df3 <- df1[ri,]
df3$msa <- df2[ci, "msa"]

which gives:

> df3
       g_lat     g_lon address msa
4   37.38605 -122.0838       4   1
5   37.77493 -122.4194       5   1
4.1 37.38605 -122.0838       4   3
5.1 37.77493 -122.4194       5   3

Using either or :

library(data.table)
setDT(df1)[ri][, msa := df2[ci, "msa"]][]

library(dplyr)
df1 %>% 
  slice(ri) %>% 
  mutate(msa = df2[ci, "msa"])

You can add the distance with:

df3$dist <- mat[cbind(ri, ci)]

which gives:

> df3
       g_lat     g_lon address msa     dist
4   37.38605 -122.0838       4   1 46202.74
5   37.77493 -122.4194       5   1 12774.31
4.1 37.38605 -122.0838       4   3 52359.08
5.1 37.77493 -122.4194       5   3     0.00
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • thanks so much, is it possible to get the distance to the nearest msa? – user3570187 Sep 20 '18 at 17:55
  • awesome thanks so much, what is ri and ci, are these row and column names. If I were to the get the min instead of 50 miles radius, I used df3$near_dist <- apply(mat, 1, min) to get the minimum distance msa, thanks so much, you made my day! – user3570187 Sep 21 '18 at 10:43
  • @user3570187 `ri` and `ci` are row and column indexes respectively; instead of `apply(mat, 1, min)` you could also use `max.col(-mat)` which is more efficient – Jaap Sep 21 '18 at 10:45