0

I want to create polygons inside an apply and want to do this as quickly as possible from a matrix of coordinates. I have some code and realized this is one of the slowest parts of my code. How can I do this efficiently? I tried two different approaches:

Approach 1

library(sp)
library(terra)    
t0 <- Sys.time()

poly_list <- apply(matrix(1:10000), 1, function(idx){      
  # set coordinates
  coords <- cbind(rnorm(100), rnorm(100))      
  # create polygon
  Polygons(list(Polygon(coords)), idx)
})
  
# convert to terra polygons
poly_terra <- vect(SpatialPolygons(poly_list))

# show time passed
print(Sys.time() - t0)   
# Time difference of 2.082166 secs

Approach 2

t0 <- Sys.time()    
poly_list <- apply(matrix(1:10000), 1, function(idx){      
  # set coordinates
  coords <- cbind(rnorm(100), rnorm(100))      
  # create polygon
  vect(coords, type = "polygon")
})

# convert to terra polygons
poly_terra <- vect(poly_list)

print(Sys.time() - t0)
# Time difference of 16.38044 secs

Why is it faster to create sp polygons and convert them afterwards than directly creating terra polygons? The code with vect(SpatialPolygons(Polygons(list(Polygon(coords)), idx))) seems somewhat complicated. Is there a faster or at least more elegant way?

Edit Currently my fastest option, although it feels illegal:

t0 <- Sys.time()    
dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")   
poly_list <- apply(matrix(1:10000), 1, function(idx){
  
  # set coordinates
  coords <- cbind(rnorm(100), rnorm(100))     
  # create polygon
  new <- dummy
  new@ID <- as.character(idx)
  new@Polygons[[1]]@coords <- coords
  return(new)
})

# convert to terra polygons
poly_terra <- vect(SpatialPolygons(poly_list))
print(Sys.time() - t0)
# Time difference of 0.7147191 secs
Zoe
  • 906
  • 4
  • 15

2 Answers2

2

This is faster than your examples

t0 <- Sys.time()    
p <- lapply(1:10000, function(idx){
  cbind(id=idx, x=rnorm(100), y=rnorm(100))
})
p <- do.call(rbind, p)
v <- vect(p, "polygons")
print(Sys.time() - t0)
#Time difference of 0.483578 secs

This uses lapply and you state that you want to use apply; but in the context of your example apply does not seem to be a good choice.

I do not see much performance difference between your two sp approaches. Below I use a streamlined version of the one you say is fastest and benchmark it with my approach:

with_terra <- function() {
   p <- lapply(1:10000, function(idx){
      cbind(id=idx, x=rnorm(100), y=rnorm(100))
   })
   p <- do.call(rbind, p)
   vect(p, "polygons")
}

with_sp <- function() {
   dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")   
   poly_list <- apply(matrix(1:10000), 1, function(idx){
     dummy@ID <- as.character(idx)
     dummy@Polygons[[1]]@coords <- cbind(rnorm(100), rnorm(100))
     dummy
   })
   vect(SpatialPolygons(poly_list))
}

bm <- microbenchmark::microbenchmark(
  sp = with_sp(),
  terra = with_terra(),
  times = 10
)

bm
#Unit: milliseconds
#  expr      min       lq     mean   median       uq       max neval
#    sp 836.8434 892.8411 930.5261 935.3788 968.2724 1039.2840    10
# terra 261.2191 276.0770 298.3603 282.7462 296.3674  437.0505    10
Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
  • Nice! So basically you're making use of matrices as long as possible and create your final object just in the final step? Apparently, I should have a look on `do.call()`. Ok, this totally pulverized my take. However, `vect(p, "polygons")` fails with `Error: [vect] not an appropriate matrix` when I try to run your example. – dimfalk Aug 10 '22 at 08:15
  • For me, this code works without anny issues. Thank you! I did not know you could create multiple polygons in `terra` using a matrix! – Zoe Aug 10 '22 at 09:55
  • @falk-env: yes, and you probably need to update "terra" for this to work – Robert Hijmans Aug 10 '22 at 10:29
  • @ Yup, it's working now using `terra 1.6-7`. Thanks! – dimfalk Aug 10 '22 at 21:13
1

I'm not really sure if this will be of help, but I had some good time experimenting and fine-tuning and thought I'll share my preliminary results at least.

Foremost, let me share some input for further reading:

  • This article was the starting point for some code optimization I worked on some time ago: FasteR! HigheR! StrongeR!
  • Maybe it's just me, but I prefer for-loops over apply and this approach does not seem to be slower (c.f. here). On the contrary, the median execution time of your first approach was ~0.12 s faster on my machine after I used a loop instead, but maybe there is another reason for you using apply here.
  • If you choose to go for a loop, here is another great guide how to reduce execution time.
  • Making use of namespaces actually slows down your code (c.f. here), so better attach the packages you are going to use - like you did.
  • The native pipe does not seem to have any overhead (c.f. here), so this might be a great way to un-nest your functions and make them look tidier without penalties.
  • For timing purposes, I came across {tictoc} some time ago as a nice implementation of Sys.time() - t0 from my point of view, for actual benchmarking, {benchmarking} is great.

Noam Ross suggests to find better (= faster) packages for your purpose. You already noticed {sp} operates faster than {terra} with your example. Let me present a third option:

library(sp)
library(terra)
library(sf)

# first node has to be equal to the last node for a polygon to be closed
coords <- cbind(rnorm(99), rnorm(99))
coords <- rbind(coords, coords[1, ])

mbm <- microbenchmark::microbenchmark(
  
  sp = Polygon(coords) |> list() |> Polygons(1),
  terra = vect(coords, type = "polygons"),
  sf = list(coords) |> st_polygon(),
  times = 100
)

ggplot2::autoplot(mbm)

If your target object has to be of class SpatVector, you may consider applying terra::vect() once as a final step. However, what exactly is your goal once you created your polygon objects? This might affect which package / workflow to use. E.g. if you only need geometries in a specific order, you might drop attributes etc.

Considering your third approach - it should not feel illegal from my point of view, pre-allocating objects and exchanging some attributes seems like a smart move to do - a condensate might encompass loops, pipes and maybe the {sf} package, whereat I failed implementing the latter at the moment, but at least I did not slow down your code so far:

# your take on this
illegal_approach_a <- function() {
  
  dummy <- Polygons(list(Polygon(cbind(rep(0,4), rep(0,4)))), "0")
  
  poly_list <- apply(matrix(1:10000), 1, function(idx){
    
    # set coordinates
    coords <- cbind(rnorm(100), rnorm(100))
    
    # create polygon
    new <- dummy
    new@ID <- as.character(idx)
    new@Polygons[[1]]@coords <- coords
    return(new)
  })
  
  # convert to terra polygons
  poly_terra <- vect(SpatialPolygons(poly_list))
}

# my take on this
illegal_approach_b <- function() {
  
  dummy <- cbind(rep(0, 4), rep(0, 4)) |> Polygon() |> list() |> Polygons("0")
  
  poly_list <- list()
  
  for (i in 1:10000) {
    
    # set coordinates
    coords <- cbind(rnorm(100), rnorm(100))
    
    # create polygon
    new <- dummy
    new@ID <- as.character(i)
    new@Polygons[[1]]@coords <- coords
    
    poly_list[[i]] <- new
  }
  
  # convert to terra polygons
  poly_terra <- SpatialPolygons(poly_list) |> vect()
}

mbm <- microbenchmark::microbenchmark(
  
  your_take = illegal_approach_a(),
  my_take = illegal_approach_b(),
  times = 100
)

ggplot2::autoplot(mbm)

dimfalk
  • 853
  • 1
  • 5
  • 15
  • Thank you for you answer! I did not have time today, but I will definitely check out the links you sent. I always thought for-loops are the root of all evil in R, at least concerning computation time. – Zoe Aug 09 '22 at 14:34
  • No worries. Yeah, that's what I was told a lot some years ago too, but it seems like `microbenchmark` proves them wrong? However, I can think of "bad" loops with a lot of unnecessary re-initializations, without any vectoriazation, nested loops, etc making the overall code quite slow below the line. – dimfalk Aug 09 '22 at 19:00
  • loops *can* be evil. Try adding 1 to a very large numeric vector with and without a loop. In contrast, `apply` is just a convenient way to do a loop. – Robert Hijmans Aug 10 '22 at 06:23