Can I bound an st_distance call by a polygon?

Question

I have seen similar posts on this topic (see, for example, here and here) but not one that is specific to the sf-tidyverse ecosystem.

I have a series of lakes, a series of sample points within each lake, and a "focal point" in each lake that represents where a boat launch is. I want to calculate the "boater's shortest travel distance" to each sample point from the boat launch. However, I want to somehow "bound" these distances using the lake polygon such that distances cannot be calculated across land. I could imagine this being done by having the "straight line" snake along the lake edge for as long as needed before it can resume being a straight line. I could also imagine the "straight line" being decomposed into line segments that bend around any intervening land. I'm open to a variety of implementations!

I have seen elsewhere (such as here) the idea of converting the bounding polygon to a raster, making the water one value and the land another, much higher value, and then doing a "least-cost distance," where the cost of going over land is prohibitive. However, I don't know how one would actually do this in the raster/sf ecosystem.

Here's the code I used to make this map:

library(dplyr)
library(sf)
library(ggplot2)
Moose.ssw = sswMN.sf %>% filter(lake == "Moose")
Moose.lake = MN_lakes4 %>% filter(str_detect(map_label, "Moose")) %>% filter(cty_name == "Beltrami") 
Moose.access = try3 %>% filter(LAKE_NAME == "Moose") %>%  filter(COUNTYNAME == "Beltrami")
Moose.box = st_bbox(Moose.ssw)

ggplot() +
  geom_sf(data = Moose.lake, color="lightblue") + 
  geom_sf(data = Moose.access, color = "red", size = 2) +
  geom_sf(data = Moose.ssw, mapping = aes(color= Nitellopsis_obtusa_n)) +
  coord_sf(xlim = c(Moose.box[1], Moose.box[3]), ylim = c(Moose.box[2], Moose.box[4]))

And here's some code that calculates straight-line distances splendidly--maybe it can be wrappered somehow?

Moose.pt.dists = st_distance(Moose.access, Moose.ssw, by_element = TRUE)

Files needed to make the data objects referenced above can be pulled from my Github page (they are files produced via dput(). Link to my Github.

I am a solid R programmer but I am new to geospatial work, so if I could even just be pointed in a fruitful direction, I may be able to program my own way out of this!

mrhellmann · Accepted Answer · 2021-08-18T15:15:50.040

Here's a solution using sfnetworks, which fits in with the tidyverse well.

The code below should

regularly sample the bounding box of the lake (creates evenly-spaced points)
connect them to their closest neighbors
remove the connections that cross land
find the shortest path from the boat launch to the sample location(s) by following the lines that remain.

The results are not exact, but are pretty close. You can increase precision by increasing the number of sampled points. 1000 points are used below.


library(tidyverse)
library(sf)
library(sfnetworks)
library(nngeo)

# set seed to make the script reproducible,
#  since there is random sampling used
set.seed(813)

# Getting your data:
x <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.lake")
# Subset to get just one lake
moose_lake <- x[5,]
boat_ramp <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.access")
sample_locations <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.ssw")
sample_bbox <- dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.box")


# sample the bounding box with regular square points, then connect each point to the closest 9 points
#  8 should've worked, but left some diagonals out.
sq_grid_sample <- st_sample(st_as_sfc(st_bbox(moose_lake)), size = 1000, type = 'regular') %>% st_as_sf() %>%
  st_connect(.,.,k = 9)

# remove connections that are not within the lake polygon
sq_grid_cropped <- sq_grid_sample[st_within(sq_grid_sample, moose_lake, sparse = F),]

# make an sfnetwork of the cropped grid
lake_network <- sq_grid_cropped %>% as_sfnetwork()

# find the (approximate) distance from boat ramp to point 170 (far north)
pt170 <- st_network_paths(lake_network, 
                          from = boat_ramp,
                          to = sample_locations[170,]) %>%
  pull(edge_paths) %>%
  unlist()

lake_network %>% 
  activate(edges) %>%
  slice(pt170) %>%
  st_as_sf() %>%
  st_combine() %>%
  st_length()
#> 2186.394 [m]

Looks like it is about 2186m from the boat launch to sample location 170 at the far north end of the lake.


# Plotting all of the above, path from boat ramp to point 170:

ggplot() + 
  geom_sf(data = sq_grid_sample, alpha = .05) +
  geom_sf(data = sq_grid_cropped, color = 'dodgerblue') + 
  geom_sf(data = moose_lake, color = 'blue', fill = NA) +
  geom_sf(data = boat_ramp, color = 'springgreen', size = 4) + 
  geom_sf(data = sample_locations[170,], color = 'deeppink1', size = 4) +
  geom_sf(data = lake_network %>% 
            activate(edges) %>%
            slice(pt170) %>%
            st_as_sf(),
          color = 'turquoise',
          size = 2) +
  theme_void()

Though only the distance from the boat launch to one sample point is found and plotted above, the network is there to find all of the distances. You may need to use *apply or purrr, and maybe a small custom function to find the 'one-to-many' distances of the launch to all sample points.

This page on sfnetworks will be helpful in writing the one-to-many solution.

edit:

To find all distances from the boat launch to the sample points:

st_network_cost(lake_network,
                from = boat_ramp,
                to = sample_locations)

This is much faster for me than a for loop or the sp solution posted below. Some code in the sampling may need to be adjusted since the st_network_cost will remove any duplicate distances. The sample_locations (or Moose.ssw in @bajcz answer) will need to be cleaned to remove duplicate points as well. There are 180 unique points of the 322 rows.

+1 an excellent answer and also very elegant! See my answer to my own question below also--I think I may have spotted a typo in your code? Also, if you can suss out what the *apply solution is to vectorize where I have put in a for loop, I can add that in and see if it makes the code faster! Thanks! — Bajcz, Aug 17 '21 at 18:54
Thanks @Bajcz. I'll take another look at it for typos & *apply/purrr functionality soon. I think there are a couple of simple improvements to decrease computation time & increase distance accuracy. — mrhellmann, Aug 17 '21 at 19:19
@Bajcz I mistakenly thought you wanted the route & distance to each point. I added a faster method to find only the distances. In my testing the sp method took ~4mins, while the (new) sfnetworks method was about 4 seconds. Let me know if you're interested in other minor changes that improve accuracy. I didn't want to make a large-ish edit to the accepted answer. — mrhellmann, Aug 18 '21 at 15:21
Awesome thanks! I think knowing how to do the routes will be helpful for many, even if not needed in my particular case. I mentioned in my post that, in my context, sometimes, boat docks are *just a bit* off the lake which causes distances to come back as Inf. I post a fix for this in my answer--installing a buffer of "fake lake" around them. Is there a similar workaround in your sfnetworks approach? — Bajcz, Aug 25 '21 at 14:57
@Bajcz There are a few workarounds for that, the best might be to use [st_network_blend](https://luukvdmeer.github.io/sfnetworks/reference/st_network_blend.html) to add the dock point to the network. A simple solution would be to just add the fixed distance from the network to the dock to all computed distances. `st_network_cost` doesn't need the dock to be on or within the network. — mrhellmann, Aug 25 '21 at 15:26
Thanks! Yes, adding `%>% st_network_blend(boat.ramp)` after the call to `as_sfnetwork()` will "snap" access points to their nearest point along the edge *and* make them a point in the network, which makes what the functions are doing implicitly (by default, they are snapped to the nearest node on the edge rather than just to the edge) a bit more explicit. For those interested, the documentation [https://cran.r-project.org/web/packages/sfnetworks/vignettes/routing.html](here) is quite digestible. — Bajcz, Aug 25 '21 at 16:26
This is a great answer that helped me with a related problem! One note folks might find helpful is that for very large grids in `st_sample()` (i.e. size > 5000), `st_within` is suuuupppperrrr slow, but thanks to @dbaston comment [here](https://gis.stackexchange.com/questions/353140/optimising-use-of-st-within-for-parallelisation), I figured out you can swap with `st_contains` which for this use case would end up being `sq_grid_sample[st_contains( moose_lake, sq_grid_sample, sparse = F),]`, and it speeds it up like >50x especially for really large sizes (e.g. 500,000) — colebrookson, Apr 29 '23 at 23:36

score 1 · Answer 2 · answered Aug 17 '21 at 18:53

I was planning to answer my own question today, as I had just managed to get the code using the gdistance and sp method linked to in my question working, but mrhellmann beat me to it! Since their answer works also and is quite nice and elegant, I thought I'd just post code here that utilizes both approaches and compares the outcomes for those who are interested (and in case one or the other doesn't work in your context). They are about equivalently fast, although that's with a for() loop in mrhellmann's sfnetworks version, so it might be faster if that bit can be vectorized, which I'm sure is possible.

   #All the hooplah needed to get things started and talking the same language. 
    x = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.lake")
    # Subset to get just one lake
    Moose.lake = x[5,]
    Moose.access = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.access")
    Moose.ssw  = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.ssw")
    Moose.box = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.box")
    
    library(sf)
    library(sp)
    library(raster)
    library(gdistance)
    library(tidyverse)
    library(sfnetworks)
    library(nngeo)
    library(mosaic)

This is "my" way using sp and gdistance tools:

    ptm <- proc.time()
    
#Make all the objects needed into spatial class objects.
    test.pts = as(Moose.access, Class = "Spatial") 
    ssw.pts = as(Moose.ssw, Class = "Spatial")
    test.bounds = as(Moose.lake, Class = "Spatial")
    
#Turn the lake into a raster of resolution 1000 x 1000 (which is arbitrary) and then make all points not in the lake = 0 so that no distances can "afford" to cross over land.
    test.raster = raster(extent(test.bounds), nrow=1000, ncol=1000)
    lake.raster = rasterize(test.bounds, test.raster)                       
    lake.raster[is.na(lake.raster)] = 0  
    
    ##For some lakes, although not this one, the public access was just off the lake, which resulted in distances of Inf. This code puts a circular buffer around the dock locations to prevent this. It makes a new raster with 1s at the dock location(s), finds all indices of cells within the buffer distance of each dock (here, 300, which is overkill), and makes the corresponding cells in the lake raster also 1 so that they are considered navigable. This makes the distances slightly inaccurate because it may allow some points to be more easily reachable than they should be, but it is better than the alternative!
    access.raster = rasterize(test.pts, lake.raster, field = 1)
    index.spots = raster::extract(access.raster, test.pts, buffer=300, cellnumbers = T)
    index.spots2 = unlist(lapply(index.spots, "[", , 1))
    lake.raster[index.spots2] = 1
    
#Make a transition matrix so that the cost of moving from one cell to the next can be understood. TBH, I don't understand this part well beyond that.
    transition.mat1 = transition(lake.raster, transitionFunction=mean, directions=16) #This code does take a little while.
    transition.mat1 = geoCorrection(transition.mat1,type="c") 
    
#Calculates the actual cost-based distances.
    dists.test = costDistance(transition.mat1, test.pts, ssw.pts)
    proc.time() - ptm  #About 55 seconds on my laptop.

And then for comparison, here's the sfnetworks way as provided by mrhellmann.

ptm <- proc.time()

sq_grid_sample = st_sample(st_as_sfc(st_bbox(Moose.lake)), size = 1000, type = 'regular') %>% st_as_sf() %>%
  st_connect(.,.,k = 9)

sq_grid_cropped = sq_grid_sample[st_within(sq_grid_sample, Moose.lake, sparse = F)] #I needed to delete a comma in this line to get it to work--I don't think we needed row,column indexing here.

lake_network = sq_grid_cropped %>% as_sfnetwork()

##Using their code to generate all the distances between the dock and the sample points. This might be vectorizable. 
dists.test2 = rep(0, nrow(Moose.ssw))
for (i in 1:nrow(Moose.ssw)) {
dist.pt = st_network_paths(lake_network, 
                          from = Moose.access,
                          to = Moose.ssw[i,]) %>%
  pull(edge_paths) %>%
  unlist() #This produces a vertices we must go through I think?

dists.test2[i] = lake_network %>% 
  activate(edges) %>%
  slice(dist.pt) %>% 
  st_as_sf() %>%
  st_combine() %>%
  st_length()
}
proc.time() - ptm #About 58 seconds on my laptop.

Here are two graphs that I think demonstrate that both approaches produce relatively similar numbers without a lot of signs of bias, although it does look like some of the distances ended up being a little longer for the sfnetworks approach. The first is overlapped density plots of the distance estimates using the two approaches and the second is a "1-to-1" scatterplot of the distances plotted against each other, and you can see that the fit to a 1-to-1 line would be pretty good.

hist.df = data.frame(dists = c(dists.test, dists.test2), version = rep(c("sp", "sfnetworks"), each = 322))

gf_density(~dists, fill=~version, data=hist.df )
gf_point(dists.test2~dists.test)

hugh-allan · Answer 3 · 2022-05-11T05:24:54.140

I know this question has been answered, but I thought I'd throw an alternative method in here anyway. And, it looks like (for me at least) this approach is a little faster than the others. This uses terra, which is the "replacement" for the raster package; many functions have the same name and do the same job, but the main difference is that terra uses SpatRaster objects, whereas raster uses raster objects.

# import data ---

x = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.lake")

# Subset to get just one lake
Moose.lake = x[5,]

Moose.access = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.access")
Moose.ssw  = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.ssw")
Moose.box = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.box")

The first thing we do is make the lake into a raster, which is called a SpatRaster by terra. We use the extent (terra::ext()) of Moose.lake, 1000 rows and 1000 cols, and set the crs to the same as Moose.lake. Initially, we give every cell a value of 1, but then we can use terra::mask() to give every value outside Moose.lake a value of 2 (these will be the "high cost" or "no go zone" cells).

# step 1 - rasterize() the lake ---

# find the spatial extent of the lake
ext(Moose.lake) %>% 
  # make a raster with the same extent, and assign all cells to value of 1
  rast(nrow = 1000, ncol = 1000, crs = st_crs(Moose.lake)[1], vals = 1) %>% 
  # set all cells outside the lake a value of 2
  mask(vect(Moose.lake), updatevalue = 2) %>% 
  {. ->> moose_rast}

moose_rast

# class       : SpatRaster 
# dimensions  : 1000, 1000, 1  (nrow, ncol, nlyr)
# resolution  : 2.899889, 3.239947  (x, y)
# extent      : 388474, 391373.9, 5265182, 5268422  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=utm +zone=15 +datum=NAD83 +units=m +no_defs 
# source      : memory 
# name        : lyr.1 
# min value   :     1 
# max value   :     2 

plot(moose_rast)

Then, we give the cell where Moose.access is a value of 3 (or any value other than 1 or 2) - this will be our starting point.

# now, make the boat ramp a value of 3 (this is the starting point)
# find the cell number based on the coordinates of Moose.access
moose_rast %>%
  cellFromXY(st_coordinates(Moose.access)) %>%
  {. ->> access_cell}

access_cell
# [1] 561618

# assign that cell a value of 3 (or any number you want other than 1 or 2)
values(moose_rast)[access_cell] <- 3

# check it
moose_rast %>% plot

Although you can't really see it on the plot because the cell is so small, we can tell that the cell value has changed because our legend now includes a value of 3.

Next up, we use terra::gridDistance() to find the distance from the starting cell (value of 3), to every other cell. We specify origin = 3 to assign this cell as the starting point, and we tell it not to traverse any cells with value of 2 using omit = 2. This function returns a SpatRaster, but this time the cell values are the distances to the origin.

# now, find distances from the access_cell to every other cell
moose_rast %>% 
  gridDistance(origin = 3, omit = 2) %>% 
  {. ->> moose_rast_distances}

moose_rast_distances

# class       : SpatRaster 
# dimensions  : 1000, 1000, 1  (nrow, ncol, nlyr)
# resolution  : 2.899889, 3.239947  (x, y)
# extent      : 388474, 391373.9, 5265182, 5268422  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=utm +zone=15 +datum=NAD83 +units=m +no_defs 
# source      : memory 
# name        :    lyr.1 
# min value   :        0 
# max value   : 2424.096 

# check it
moose_rast_distances %>% plot

In this plot, the areas closest to the access cell are white, and those farthest away are green.

So now all we have to do is find the cell numbers of the sample sites within the lake and pull out their cell values. We use terra::cellFromXY() to find the cell numbers, based on a set of XY coordinates.

# now, find the cell numbers of all the study sites
moose_rast %>% 
  cellFromXY(st_coordinates(Moose.ssw)) %>% 
  {. ->> site_cells}

# cell numbers
site_cells %>% 
  head(50)
# [1] 953207 953233 930156 930182 930207 930233 907156 907182 907207 884078 884130
# [12] 884156 884182 884207 861052 861078 861104 861130 861156 861182 861207 837026
# [23] 837052 837078 837104 837130 837156 837182 837207 814026 814052 814078 814104
# [34] 814130 814156 814182 814207 791026 791104 791130 791156 791182 768182 745182
# [45] 745207 722026 722233 699259 675052 675285

# and now their distance values
values(moose_rast_distances)[site_cells] %>% head(50)
# [1] 2212.998 2241.812 2144.020 2115.206 2138.479 2167.293 2069.501 2040.688 2063.960
# [10] 2081.424 2023.796 1994.983 1966.169 1989.441 2078.719 2006.905 1978.092 1949.278
# [19] 1920.464 1891.650 1914.923 2119.358 2043.960 1968.563 1900.333 1871.519 1842.705
# [28] 1813.891 1837.164 2086.047 2010.650 1935.253 1859.856 1797.000 1768.186 1739.372
# [37] 1762.645 2052.736 1826.545 1751.148 1693.667 1664.854 1590.335 1533.733 1488.110
# [46] 1952.805 1384.778 1281.445 1809.338 1174.872

And to make the distances a little more user friendly, we can instead put them into a new column in the sf collection.

# now, make a new column in the study sites which is the distance to the access_cell
Moose.ssw %>% 
  rowwise %>%
  mutate(
    distance_to_access = cellFromXY(moose_rast, st_coordinates(geometry)) %>% values(moose_rast_distances)[.]
  ) %>% 
  select(distance_to_access, everything())

# Simple feature collection with 322 features and 122 fields
# Geometry type: POINT
# Dimension:     XY
# Bounding box:  xmin: 388549 ymin: 5265332 xmax: 391324 ymax: 5268332
# CRS:           +proj=utm +zone=15 +datum=NAD83 +unit=m
# # A tibble: 322 x 123
# # Rowwise: 
#  distance_to_access lake  county       dow lake_size_acres contact      year_discovered first_year_trea~ survey_year
#                <dbl> <chr> <chr>      <int>           <dbl> <chr>                  <int> <chr>                  <int>
# 1               2213. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 2               2242. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 3               2144. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 4               2115. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 5               2138. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 6               2167. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 7               2070. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 8               2041. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 9               2064. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# 10              2081. Moose Beltrami 4001100            601. Nicole Kovar            2016 NoTreat                 2017
# # ... with 312 more rows, and 114 more variables: survey_dates <chr>, surveyor <chr>, pointid <int>, depth_ft <dbl>,
# #   Myriophyllum_spicatum_or_hybrid_n <int>, Potamogeton_crispus_n <int>, Bidens_beckii_n <int>,
# #   Brasenia_schreberi_n <int>, Ceratophyllum_demersum_n <int>, Ceratophyllum_echinatum_n <int>, Chara_sp_n <int>,
# #   Eleocharis_acicularis_n <int>, Eleocharis_palustris_n <int>, Elodea_canadensis_n <int>,
# #   Elodea_nuttallii_n <int>, Heteranthera_dubia_n <int>, Lemna_minor_n <int>, Lemna_trisulca_n <int>,
# #   Myriophyllum_heterophyllum_n <int>, Myriophyllum_sibiricum_n <int>, Myriophyllum_verticillatum_n <int>,
# #   Najas_flexilis_n <int>, Najas_gracillima_n <int>, Najas_guadalupensis_n <int>, Najas_marina_n <int>, ...

The distances are in metres because the CRS is UTM.

Additionally, gridDistance() could also be used to find the shortest distance from every point within the lake, to the lake'e edge. To do this we say origin = 2, which rather than just a single point like our boat ramp earlier, is every cell on land.

moose_rast %>% 
  gridDistance(origin = 2) %>% 
  plot

So I ran the two approaches in your answer above, and compared the results and times to this method.

Time-wise, the terra approach was much faster, on my computer anyway:

sp ~230 secs
sfnetworks ~83 secs
terra ~5 secs

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# sp

#All the hooplah needed to get things started and talking the same language. 
x = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.lake")
# Subset to get just one lake
Moose.lake = x[5,]
Moose.access = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.access")
Moose.ssw  = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.ssw")
Moose.box = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.box")

library(sf)
library(sp)
library(raster)
library(gdistance)
library(tidyverse)
library(sfnetworks)
library(nngeo)
library(mosaic)



ptm <- proc.time()

#Make all the objects needed into spatial class objects.
test.pts = as(Moose.access, Class = "Spatial") 
ssw.pts = as(Moose.ssw, Class = "Spatial")
test.bounds = as(Moose.lake, Class = "Spatial")

#Turn the lake into a raster of resolution 1000 x 1000 (which is arbitrary) and then make all points not in the lake = 0 so that no distances can "afford" to cross over land.
test.raster = raster(extent(test.bounds), nrow=1000, ncol=1000)
lake.raster = rasterize(test.bounds, test.raster)                       
lake.raster[is.na(lake.raster)] = 0  

##For some lakes, although not this one, the public access was just off the lake, which resulted in distances of Inf. This code puts a circular buffer around the dock locations to prevent this. It makes a new raster with 1s at the dock location(s), finds all indices of cells within the buffer distance of each dock (here, 300, which is overkill), and makes the corresponding cells in the lake raster also 1 so that they are considered navigable. This makes the distances slightly inaccurate because it may allow some points to be more easily reachable than they should be, but it is better than the alternative!
access.raster = rasterize(test.pts, lake.raster, field = 1)
index.spots = raster::extract(access.raster, test.pts, buffer=300, cellnumbers = T)
index.spots2 = unlist(lapply(index.spots, "[", , 1))
lake.raster[index.spots2] = 1

#Make a transition matrix so that the cost of moving from one cell to the next can be understood. TBH, I don't understand this part well beyond that.
transition.mat1 = transition(lake.raster, transitionFunction=mean, directions=16) #This code does take a little while.
transition.mat1 = geoCorrection(transition.mat1,type="c") 

#Calculates the actual cost-based distances.
dists.test = costDistance(transition.mat1, test.pts, ssw.pts)
proc.time() - ptm  
# 234 secs


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# sfnetworks

#All the hooplah needed to get things started and talking the same language. 
x = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.lake")
# Subset to get just one lake
Moose.lake = x[5,]
Moose.access = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.access")
Moose.ssw  = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.ssw")
Moose.box = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.box")

library(sf)
library(sp)
library(raster)
library(gdistance)
library(tidyverse)
library(sfnetworks)
library(nngeo)
library(mosaic)


ptm <- proc.time()

sq_grid_sample = st_sample(st_as_sfc(st_bbox(Moose.lake)), size = 1000, type = 'regular') %>% st_as_sf() %>%
  st_connect(.,.,k = 9)

sq_grid_cropped = sq_grid_sample[st_within(sq_grid_sample, Moose.lake, sparse = F)] #I needed to delete a comma in this line to get it to work--I don't think we needed row,column indexing here.

lake_network = sq_grid_cropped %>% as_sfnetwork()

##Using their code to generate all the distances between the dock and the sample points. This might be vectorizable. 
dists.test2 = rep(0, nrow(Moose.ssw))
for (i in 1:nrow(Moose.ssw)) {
  dist.pt = st_network_paths(lake_network, 
                             from = Moose.access,
                             to = Moose.ssw[i,]) %>%
    pull(edge_paths) %>%
    unlist() #This produces a vertices we must go through I think?
  
  dists.test2[i] = lake_network %>% 
    activate(edges) %>%
    slice(dist.pt) %>% 
    st_as_sf() %>%
    st_combine() %>%
    st_length()
}
proc.time() - ptm 
# 83 secs


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# terra

#All the hooplah needed to get things started and talking the same language. 
x = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.lake")
# Subset to get just one lake
Moose.lake = x[5,]
Moose.access = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.access")
Moose.ssw  = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.ssw")
Moose.box = dget("https://raw.githubusercontent.com/BajczA475/random-data/main/Moose.box")

library(tidyverse)
library(sf)
library(terra)


ptm <- proc.time()

# make rastre of moose lake
ext(Moose.lake) %>% 
  # make a raster with the same extent, and assign all cells to value of 1
  rast(nrow = 1000, ncol = 1000, crs = st_crs(Moose.lake)[1], vals = 1) %>% 
  # set all cells outside the lake a value of 2
  mask(vect(Moose.lake), updatevalue = 2) %>% 
  {. ->> moose_rast}

# find access cell
moose_rast %>%
  cellFromXY(st_coordinates(Moose.access)) %>%
  {. ->> access_cell}

# assign that cell a value of 3 (or any number you want other than 1 or 2)
values(moose_rast)[access_cell] <- 3

# find values to every other cell
moose_rast %>% 
  gridDistance(origin = 3, omit = 2) %>% 
  {. ->> moose_rast_distances}

# make column with distances to each sample site
distances <- Moose.ssw %>% 
  rowwise %>%
  mutate(
    distance_to_access = cellFromXY(moose_rast, st_coordinates(geometry)) %>% values(moose_rast_distances)[.]
  ) %>% 
  select(distance_to_access, everything())

proc.time() - ptm 
# 5 secs



#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

hist.df2 = data.frame(dists = c(dists.test, dists.test2, distances$distance_to_access), version = rep(c("sp", "sfnetworks", "terra"), each = 322))
gf_density(~dists, fill=~version, data=hist.df2)

And the distances values from the terra method are pretty similar to the sp method:

+1 for sure! Extremely clear and elegant presentation--I am eager to try it out for myself! Plus, I appreciate the use of the more "modern" packages here. A great contribution to the discussion! — Bajcz, May 10 '22 at 14:37
Thank you, I'm glad it's useful! A lot of the stuff we do is in rivers and lakes, like working out how far fish have to swim between points so this is type of analysis is common :) — hugh-allan, May 11 '22 at 05:27
I also appreciated seeing a rightward assignment operator -> in use...hadn't really encountered that before. My thinking is that there really should be a packaged-up solution to this type of challenge in "bounded" systems like rivers and lakes... — Bajcz, May 11 '22 at 15:18
So, testing this out, it does indeed work great and it's super fast! As a bonus, if there are multiple origins, it looks like the distances raster will only store the shortest available distance, which is what I want anyway! However, what if there are, say, two access points, and they are of two different "types," e.g. "public" and "private." For every sample point, I also want to know the *type* of the closest access, not just the distance--any efficient way to bring along attribute info like this, or would I need to figure that out separately somehow? — Bajcz, May 11 '22 at 16:00
^^Going to answer my own question! Do the distance calculations separately for the two access types (public, then private) and, for each point, figure out which distance value is lesser and use that to snag the proper type! — Bajcz, May 11 '22 at 16:03
I'm going to answer one more of my questions: Suppose you wanted to figure out, for each point, how many accesses in total are within, say, 1km. You could loop over each access you have, store the distance values to all points in a matrix, then count how many of them are <1km per point. Pretty cool! — Bajcz, May 11 '22 at 17:05
Yeah I think I saw the `{. ->> x}` on SO somewhere once, and I use it all the time! Tbh I suspect the double `>>` is overkill, but it works and it never really got explained to me but I just use it haha. It makes sense to me to use it in longer pipelines and things, because saving the object is kinda the last thing you do, once the data is in the right shape. Yeah I would agree, if you wanted public vs private access my approach would be to run both access points separately, then compare the distances to each type of access. — hugh-allan, May 12 '22 at 06:42
And the same for any number of access points - `loop` (or `map`) through each access point, save resulting distances with an access point ID, then once the data is in a grouped `tibble` the world is your oyster really! — hugh-allan, May 12 '22 at 06:44
To elaborate, the "downside" (read: difference) of using the much faster `terra` method is that you don't have the ability (that I know of) to include attribute data like _which_ access point was closer. Raster-type data is handled differently to dataframe-type data (like `sf` objects), where I believe each cell is kinda stored as a row and cell number, relative to the raster's origin and resolution, rather than the cell's particular coordinates. It makes for some super fast analysis, but you just have to treat it a little differently :) — hugh-allan, May 12 '22 at 06:53

Can I bound an st_distance call by a polygon?

3 Answers3

Linked