This answer was helpful in looking at potential solutions.
The first step would be to create a distance matrix with distm
based on both of your data frames. You can select distHaversine
method if you wish, with others available.
Then you can determine with store is closest for each customer by max.col
(negative sign before mat
will check for value that is least negative).
You can also add the distance from the matrix (here in meters).
I made up some example data from U.S. and changed store to A, B, C for clarity in answer.
library(geosphere)
# create a distance matrix
mat <- distm(customer_data[,c('long','lat')], store_data[,c('long','lat')], fun=distHaversine)
# assign the store name to customer_data based on shortest distance in the matrix
customer_data$locality <- store_data$store[max.col(-mat)]
# add distance in km for that store
customer_data$nearest_dist <- apply(mat, 1, min)/1000
Output
customer_id lat long locality nearest_dist
1 1 41.8 87.6 A 313.5497
2 2 40.7 74.0 B 440.4867
3 3 36.8 119.4 C 784.7909
Data
customer_data <- data.frame(
customer_id = c(1, 2, 3),
lat = c(41.8, 40.7, 36.8),
long = c(87.6, 74, 119.4)
)
store_data <- data.frame(
store = c("A", "B", "C"),
lat = c(44.5, 44.5, 43.8),
long = c(88.7, 72.5, 120.5)
)