Your question is a bit more complicated than a simple merge, and outer
is somewhat ill-suited for the purpose. To be as thorough as possible, we want to calculate the distance between all combinations of houses and stations, then keep only the closest station per house. We'll need two packages:
library(tidyverse)
library(geosphere)
First, a bit of prep. distm
expects coordinates to be ordered as longitude first, latitude second (you have the opposite), so let's fix that, name the columns better, and correct a typo while we're at it:
houses <- data.frame(house_number = c(1:3),
lon_house = seq(1.4, 1.6, by = 0.1),
lat_house = seq(1.1, 1.3, by = 0.1)
)
stations <- data.frame(station_number = c(1:11),
lon_station = seq(2, 3, by = 0.1),
lat_station = seq(1, 2, by = 0.1)
)
We'll create "nested" data frames so that it's easier to keep coordinates together:
house_nest <- nest(houses, -house_number, .key = 'house_coords')
station_nest <- nest(stations, -station_number, .key = 'station_coords')
house_number house_coords
<int> <list>
1 1 <data.frame [1 × 2]>
2 2 <data.frame [1 × 2]>
3 3 <data.frame [1 × 2]>
station_number station_coords
<int> <list>
1 1 <data.frame [1 × 2]>
2 2 <data.frame [1 × 2]>
3 3 <data.frame [1 × 2]>
4 4 <data.frame [1 × 2]>
5 5 <data.frame [1 × 2]>
6 6 <data.frame [1 × 2]>
7 7 <data.frame [1 × 2]>
8 8 <data.frame [1 × 2]>
9 9 <data.frame [1 × 2]>
10 10 <data.frame [1 × 2]>
11 11 <data.frame [1 × 2]>
Use dplyr::crossing
to combine every row from both data frames:
data.master <- crossing(house_nest, station_nest)
house_number house_coords station_number station_coords
<int> <list> <int> <list>
1 1 <data.frame [1 × 2]> 1 <data.frame [1 × 2]>
2 1 <data.frame [1 × 2]> 2 <data.frame [1 × 2]>
3 1 <data.frame [1 × 2]> 3 <data.frame [1 × 2]>
4 1 <data.frame [1 × 2]> 4 <data.frame [1 × 2]>
5 1 <data.frame [1 × 2]> 5 <data.frame [1 × 2]>
6 1 <data.frame [1 × 2]> 6 <data.frame [1 × 2]>
7 1 <data.frame [1 × 2]> 7 <data.frame [1 × 2]>
8 1 <data.frame [1 × 2]> 8 <data.frame [1 × 2]>
9 1 <data.frame [1 × 2]> 9 <data.frame [1 × 2]>
10 1 <data.frame [1 × 2]> 10 <data.frame [1 × 2]>
# ... with 23 more rows
With all this now in place, we can use distm
on each row to calculate a distance, and keep the shortest distance per house:
data.dist <- data.master %>%
mutate(dist = map2_dbl(house_coords, station_coords, distm)) %>%
group_by(house_number) %>%
filter(dist == min(dist))
house_number house_coords station_number station_coords dist
<int> <list> <int> <list> <dbl>
1 1 <data.frame [1 × 2]> 1 <data.frame [1 × 2]> 67690.
2 2 <data.frame [1 × 2]> 1 <data.frame [1 × 2]> 59883.
3 3 <data.frame [1 × 2]> 1 <data.frame [1 × 2]> 55519.