1

I have two cities and one individual. I'd like to find the city that is closest to this individual. To do that, I can use sf::st_nearest_feature(). However, so far it only uses the distance as the crow flies between the individual and each city. I'd like to add the constraint that the path must stay inside a polygon.

In the example below:

  • individual (red triangle) is closer to city A than to city B if we consider distance as the crow flies;
  • however if we add the constraint that the individual can only move inside the polygon, then it is closer to city B.
library(sf)
#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(ggplot2)
library(ggrepel)
library(rnaturalearth)

background <- ne_countries(scale = 'small', type = 'map_units', returnclass = 'sf') |>
  subset(name %in% c("England", "Wales")) |> 
  st_union()

cities <- data.frame(
  name = c("A", "B"),
  lon = c(-4.3, -3.3),
  lat = c(51.2, 51.45)
) |> 
  st_as_sf(coords = c("lon", "lat"), crs = 4326)

individual <- data.frame(id = 1, lon = -4.3, lat = 51.6) |> 
  st_as_sf(coords = c("lon", "lat"), crs = 4326)

ggplot() +
  geom_sf(data = background) +
  geom_sf(data = cities, size = 3) +
  geom_sf(data = individual, color = "red", shape = 17, size = 3) +
  coord_sf(xlim = c(-6, -1), ylim = c(50, 52)) +
  geom_text_repel(
    data = cities, 
    aes(geometry = geometry, label = name),
    stat = "sf_coordinates",
  )
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
#> give correct results for longitude/latitude data

st_nearest_feature() tells me that the nearest city is A:

nearest <- st_nearest_feature(individual, cities)
cities[nearest, "name"]
#> Simple feature collection with 1 feature and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -4.3 ymin: 51.2 xmax: -4.3 ymax: 51.2
#> Geodetic CRS:  WGS 84
#>   name          geometry
#> 1    A POINT (-4.3 51.2)

How can I modify my measure of distance so that the nearest city is B? If possible, the solution should scale well to do this for millions of individuals (the number of cities will stay small) in a reasonable time.

bretauv
  • 7,756
  • 2
  • 20
  • 57

1 Answers1

1

You can create LINESTRING representations of the paths and use st_contains() to find which are fully inside the polygon. Then select the shortest one after having disqualified those not inside the polygon by setting them to infinity:

paths <- st_cast(st_union(st_geometry(cities), st_geometry(individual)), 
                 'LINESTRING')
path.len <- st_length(paths)
path.len[!st_contains(background, paths, sparse=F)] <- Inf
cities[which.min(path.len), "name"]
# Simple feature collection with 1 feature and 1 field
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: -3.3 ymin: 51.45 xmax: -3.3 ymax: 51.45
# Geodetic CRS:  WGS 84
#   name           geometry
# 2    B POINT (-3.3 51.45)

For more than one individual

Data:

individual <- data.frame(id = 1:3, 
                         lon = -4.3 + c(.7, 0, 1), 
                         lat = 51.6 + c(-.3, 0, 0)) |> 
  st_as_sf(coords = c("lon", "lat"), crs = 4326)

ggplot() +
  geom_sf(data = background) +
  geom_sf(data = cities, size = 3) +
  geom_sf(data = individual, color = "red", shape = 17, size = 3) +
  coord_sf(xlim = c(-6, -1), ylim = c(50, 52)) +
  geom_text_repel(
    data = cities, 
    aes(geometry = geometry, label = name),
    stat = "sf_coordinates",
  ) + 
  geom_text_repel(
    data = individual, 
    aes(geometry = geometry, label = id),
    stat = "sf_coordinates",
  )

three individuals and two cities

Basically do the same succesively for each individual:

nearest <- sapply(seq_along(st_geometry(individual)), \(i) {
  paths <- st_cast(st_union(st_geometry(cities), st_geometry(individual)[i]), 
                   'LINESTRING')
  path.len <- st_length(paths)
  path.len[!st_contains(background, paths, sparse=F)] <- Inf
  which.min(path.len)
})
cities[nearest, 'name']
# Simple feature collection with 3 features and 1 field
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: -4.3 ymin: 51.2 xmax: -3.7 ymax: 51.5
# Geodetic CRS:  WGS 84
#     name          geometry
# 1      A POINT (-4.3 51.2)
# 2      B POINT (-3.7 51.5)
# 2.1    B POINT (-3.7 51.5)
Robert Hacken
  • 3,878
  • 1
  • 13
  • 15