Efficiently create many polygons

Question

I want to create polygons inside an apply and want to do this as quickly as possible from a matrix of coordinates. I have some code and realized this is one of the slowest parts of my code. How can I do this efficiently? I tried two different approaches:

Approach 1

library(sp)
library(terra)    
t0 <- Sys.time()

poly_list <- apply(matrix(1:10000), 1, function(idx){      
  # set coordinates
  coords <- cbind(rnorm(100), rnorm(100))      
  # create polygon
  Polygons(list(Polygon(coords)), idx)
})
  
# convert to terra polygons
poly_terra <- vect(SpatialPolygons(poly_list))

# show time passed
print(Sys.time() - t0)   
# Time difference of 2.082166 secs

Approach 2

t0 <- Sys.time()    
poly_list <- apply(matrix(1:10000), 1, function(idx){      
  # set coordinates
  coords <- cbind(rnorm(100), rnorm(100))      
  # create polygon
  vect(coords, type = "polygon")
})

# convert to terra polygons
poly_terra <- vect(poly_list)

print(Sys.time() - t0)
# Time difference of 16.38044 secs

Why is it faster to create sp polygons and convert them afterwards than directly creating terra polygons? The code with vect(SpatialPolygons(Polygons(list(Polygon(coords)), idx))) seems somewhat complicated. Is there a faster or at least more elegant way?

Edit Currently my fastest option, although it feels illegal:

t0 <- Sys.time()    
dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")   
poly_list <- apply(matrix(1:10000), 1, function(idx){
  
  # set coordinates
  coords <- cbind(rnorm(100), rnorm(100))     
  # create polygon
  new <- dummy
  new@ID <- as.character(idx)
  new@Polygons[[1]]@coords <- coords
  return(new)
})

# convert to terra polygons
poly_terra <- vect(SpatialPolygons(poly_list))
print(Sys.time() - t0)
# Time difference of 0.7147191 secs

Robert Hijmans · Accepted Answer · 2022-08-10T10:39:47.083

This is faster than your examples

t0 <- Sys.time()    
p <- lapply(1:10000, function(idx){
  cbind(id=idx, x=rnorm(100), y=rnorm(100))
})
p <- do.call(rbind, p)
v <- vect(p, "polygons")
print(Sys.time() - t0)
#Time difference of 0.483578 secs

This uses lapply and you state that you want to use apply; but in the context of your example apply does not seem to be a good choice.

I do not see much performance difference between your two sp approaches. Below I use a streamlined version of the one you say is fastest and benchmark it with my approach:

with_terra <- function() {
   p <- lapply(1:10000, function(idx){
      cbind(id=idx, x=rnorm(100), y=rnorm(100))
   })
   p <- do.call(rbind, p)
   vect(p, "polygons")
}

with_sp <- function() {
   dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")   
   poly_list <- apply(matrix(1:10000), 1, function(idx){
     dummy@ID <- as.character(idx)
     dummy@Polygons[[1]]@coords <- cbind(rnorm(100), rnorm(100))
     dummy
   })
   vect(SpatialPolygons(poly_list))
}

bm <- microbenchmark::microbenchmark(
  sp = with_sp(),
  terra = with_terra(),
  times = 10
)

bm
#Unit: milliseconds
#  expr      min       lq     mean   median       uq       max neval
#    sp 836.8434 892.8411 930.5261 935.3788 968.2724 1039.2840    10
# terra 261.2191 276.0770 298.3603 282.7462 296.3674  437.0505    10

Nice! So basically you're making use of matrices as long as possible and create your final object just in the final step? Apparently, I should have a look on `do.call()`. Ok, this totally pulverized my take. However, `vect(p, "polygons")` fails with `Error: [vect] not an appropriate matrix` when I try to run your example. — dimfalk, Aug 10 '22 at 08:15
For me, this code works without anny issues. Thank you! I did not know you could create multiple polygons in `terra` using a matrix! — Zoe, Aug 10 '22 at 09:55
@falk-env: yes, and you probably need to update "terra" for this to work — Robert Hijmans, Aug 10 '22 at 10:29

dimfalk · Answer 2 · 2022-08-09T14:58:19.223

I'm not really sure if this will be of help, but I had some good time experimenting and fine-tuning and thought I'll share my preliminary results at least.

Foremost, let me share some input for further reading:

This article was the starting point for some code optimization I worked on some time ago: FasteR! HigheR! StrongeR!
Maybe it's just me, but I prefer for-loops over apply and this approach does not seem to be slower (c.f. here). On the contrary, the median execution time of your first approach was ~0.12 s faster on my machine after I used a loop instead, but maybe there is another reason for you using apply here.
If you choose to go for a loop, here is another great guide how to reduce execution time.
Making use of namespaces actually slows down your code (c.f. here), so better attach the packages you are going to use - like you did.
The native pipe does not seem to have any overhead (c.f. here), so this might be a great way to un-nest your functions and make them look tidier without penalties.
For timing purposes, I came across {tictoc} some time ago as a nice implementation of Sys.time() - t0 from my point of view, for actual benchmarking, {benchmarking} is great.

Noam Ross suggests to find better (= faster) packages for your purpose. You already noticed {sp} operates faster than {terra} with your example. Let me present a third option:

library(sp)
library(terra)
library(sf)

# first node has to be equal to the last node for a polygon to be closed
coords <- cbind(rnorm(99), rnorm(99))
coords <- rbind(coords, coords[1, ])

mbm <- microbenchmark::microbenchmark(
  
  sp = Polygon(coords) |> list() |> Polygons(1),
  terra = vect(coords, type = "polygons"),
  sf = list(coords) |> st_polygon(),
  times = 100
)

ggplot2::autoplot(mbm)

If your target object has to be of class SpatVector, you may consider applying terra::vect() once as a final step. However, what exactly is your goal once you created your polygon objects? This might affect which package / workflow to use. E.g. if you only need geometries in a specific order, you might drop attributes etc.

Considering your third approach - it should not feel illegal from my point of view, pre-allocating objects and exchanging some attributes seems like a smart move to do - a condensate might encompass loops, pipes and maybe the {sf} package, whereat I failed implementing the latter at the moment, but at least I did not slow down your code so far:

# your take on this
illegal_approach_a <- function() {
  
  dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")
  
  poly_list <- apply(matrix(1:10000), 1, function(idx){
    
    # set coordinates
    coords <- cbind(rnorm(100), rnorm(100))
    
    # create polygon
    new <- dummy
    new@ID <- as.character(idx)
    new@Polygons[[1]]@coords <- coords
    return(new)
  })
  
  # convert to terra polygons
  poly_terra <- vect(SpatialPolygons(poly_list))
}

# my take on this
illegal_approach_b <- function() {
  
  dummy <- cbind(rep(0, 4), rep(0, 4)) |> Polygon() |> list() |> Polygons("0")
  
  poly_list <- list()
  
  for (i in 1:10000) {
    
    # set coordinates
    coords <- cbind(rnorm(100), rnorm(100))
    
    # create polygon
    new <- dummy
    new@ID <- as.character(i)
    new@Polygons[[1]]@coords <- coords
    
    poly_list[[i]] <- new
  }
  
  # convert to terra polygons
  poly_terra <- SpatialPolygons(poly_list) |> vect()
}

mbm <- microbenchmark::microbenchmark(
  
  your_take = illegal_approach_a(),
  my_take = illegal_approach_b(),
  times = 100
)

ggplot2::autoplot(mbm)

Thank you for you answer! I did not have time today, but I will definitely check out the links you sent. I always thought for-loops are the root of all evil in R, at least concerning computation time. — Zoe, Aug 09 '22 at 14:34
No worries. Yeah, that's what I was told a lot some years ago too, but it seems like `microbenchmark` proves them wrong? However, I can think of "bad" loops with a lot of unnecessary re-initializations, without any vectoriazation, nested loops, etc making the overall code quite slow below the line. — dimfalk, Aug 09 '22 at 19:00
loops *can* be evil. Try adding 1 to a very large numeric vector with and without a loop. In contrast, `apply` is just a convenient way to do a loop. — Robert Hijmans, Aug 10 '22 at 06:23

Efficiently create many polygons

2 Answers2