Efficiently processing large spatial data in r

Question

I am struggling with efficient data handling in the following code. The code produces my desired outcome, a data frame with two variables, but it is incredibly slow because the spatial datasets are very large and because I am an R/programming beginner. The code has been running for about 5 hours and has looped through only about 80 of the 488 total.

The following code is intended to identify the maximum number of buildings intersected by simulated wildfires in each of 488 firesheds. Firesheds are spatially contiguous polygons, each about 100-300K acres. There are well over a million simulated wildfires stored in 25 different shapefiles corresponding to geographic planning units called fire occurrence areas (FOA) which are 4-6 million acres on average. Firesheds are small, community-level polygons and there may be several dozen of them within larger, regional FOAs.

I would like to know how I can improve my method in order to significantly reduce processing time. Thanks in advance!

Datasets to be used

Firesheds are the unit of analysis. There are 488 firesheds (polygons)*

PRJ <- st_crs(st_read("./Inputs/QWRAOriginal/FOA_perims/Original/foa401rfV3_Perims.shp"))

fshed <- st_read("./Inputs/Firesheds/QWRA_Firesheds.shp") %>% st_transform(crs=PRJ)

This is large dataset of Microsoft Building Footprints. It is 1.8 GB

MBF_PNW <- st_read("./Inputs/MBF/MBF_PNW.shp")

Simulated fires are stored in 25 unique shapefiles associated with numbered fire occurrence areas (FOA).

FOA_table <- data.frame(FOA=c(401:412, 413, 413, 413,414:423),
                        File=list.files("./Inputs/QWRAOriginal/FOA_perims/Original/", pattern = ".shp$", full.names = TRUE))

#These polygons define the FOAs 
Bndry <- st_read("./Inputs/QWRAOriginal/FOA_boundaries.shp")

# empty output for storing loop products
outputs_table <- data.frame(Fireshed_N = NULL, Max_bldgs = NULL)

In the following loop: for each fireshed, identify any FOA it intersects in order to determine which simulated fires to read in. In most cases it will be only one FOA, but in some cases firesheds may be located on the boundary between two or more FOAs. Then after, identifying the relevant simulated fires, find all the fires that actually intersect the fireshed and count how many buildings were intersected by each fire. I'm only concerned with the maximum number of buildings intersected by a single fire, which I then store in the output data frame along with the fireshed name

for (i in 1:488){
  shp <- fshed[i,] #identify the polygon that represents fireshed 'i'

  foa_int <- Bndry %>% filter(st_intersects(., shp, sparse=FALSE)) %>% st_drop_geometry() %>% dplyr::select(FOA_number) %>% deframe() 
  
  tbl <- FOA_table %>% filter(FOA %in% c(unique(foa_int)))
  
  for (k in nrow(tbl)){
    fires <- st_read(tbl$File[k]) %>% filter(st_intersects(., shp, sparse=FALSE)) %>% 
      st_buffer(0) %>% st_set_precision(0.05) %>% st_make_valid() %>%  st_intersection(shp %>% st_make_valid) %>% 
      mutate(bldg_count=lengths(st_intersects(.,MBF_PNW))) %>% st_drop_geometry() 
    
    outputs_table <- rbind(outputs_table, data.frame(Fireshed_N=shp$Frshd_N, Max_bldgs=max(fires$bldg_count))) 
  }
  if(any(i==c(61,122,183,244,305,366,427,488))){
    write.csv(outputs_table, "./Outputs/Fshed_MBF.csv")* periodically save the output*
  }
  cat('Completed Fireshed',i) # add \n to the code
}

score 1 · Answer 1 · answered Jan 13 '21 at 01:57

https://github.com/rstudio/profvis

To run code with profiling, wrap the expression in profvis(). This will result in the interactive profile visualizer. Sample code:

library(profvis)
library(ggplot2)

p <- profvis({
  g <- ggplot(diamonds, aes(carat, price)) + geom_point(size = 1, alpha = 0.2)
  print(g)
})


# View it with:
p
# or print(p)

I would modify the for loop to run once and profile the function to figure out the bottlenecks. (I am guessing it will be this command st_intersects(.,MBF_PNW))

Can you pre-compute any data?

Can you filter or subset your data, in particular, "./Inputs/MBF/MBF_PNW.shp". See R/GIS: How to subset a shapefile by a lat-long bounding box? .

Would decreasing the precision of %>% st_set_precision(0.05) %>% reduce run time?

Good luck!

Efficiently processing large spatial data in r

1 Answers1