How to speed up calculations using huge raster stacks?

Question

I have four quite huge RasterStacks and would like to do some simple calculations on them. How can I speed up these calculations? I found this approach using overlay(), but the calculations still take very long.

My RasterStacks (s1,s2,s3,s4) have all the dimensions : 26, 76, 1976, 3805 (nrow, ncol, ncell, nlayers) and my current code looks like this:

out <- overlay(s1,s2,s3,s4, fun = function(rs1,rs2,rs3,rs4) {return((rs1+rs2-rs3-rs4)*1e3)})

Any ideas?

EDIT: To produce an RasterStack (e.g., s1) you can call the following function:

create_stack <- function(num.col,num.row,num.lay){
   r <- raster(matrix(runif(num.row*num.col,0,10), ncol=num.col, nrow=num.row),
        xmn=0, xmx=num.col, ymn=0, ymx=num.row )
   ll <- replicate(num.lay , r )
   return(stack(ll))
}

library(raster)
s1 <- create_stack(76,26,3805)

please provide a [reproducible](http://stackoverflow.com/q/5963269/3250126) example. — loki, Feb 28 '17 at 13:58
You may want to try to convert your RasterStacks into big 3 dimensionnal array using the `ff` package then do your calculation on that instead. The ff package is more efficient for doing calculations, however doing the transfert from `raster` to `ff` back to `raster` may take time as well. — Bastien, Mar 08 '17 at 19:38

score 3 · Answer 1 · edited Jul 25 '19 at 00:51

3

Increasing the chunksize helps a lot when dealing with large rasters. Also I suggest putting all the temporary files in a directory that you can easily manage.

library(raster)
rasterOptions(tmpdir="C:\\", tmptime = 24, progress="text", timer=TRUE,
              overwrite = T, chunksize=2e8, maxmemory=1e8)

edited Jul 25 '19 at 00:51

jbaums

27,115
5
79
119

answered Feb 28 '17 at 23:06

Geo-sp

1,704
3
19
42

Thanks for your comment and your given rasterOptions settings! I changed my rasterOptions in my .Rprofile according to your comment, but I haven't observed an huge performance gain so far... – moremo Mar 01 '17 at 11:41
You can also try `parallel processing` while you keep the `chunksize` up. In my case, increasing the `chunksize` alone reduced the processing time in half. – Geo-sp Mar 01 '17 at 23:32

loki · Answer 2 · 2017-03-01T15:59:11.453

2

In your specific problem

out <- (s1 + s2 - s3 - s4) * 1e3

seems to be the fastest way to apply your function.

However for other problems, you should have a look at the clusterR() function. It allows you to apply functions parallelized.

"For example, it works with calc and it also works with overlay as long as you provide a single RasterStack or RasterBrick as the first argument."

With your function I create this working example:

create_stack <- function(num.col,num.row,num.lay){
  r <- raster(matrix(runif(num.row*num.col,0,10), ncol=num.col, nrow=num.row),
              xmn=0, xmx=num.col, ymn=0, ymx=num.row )
  ll <- replicate(num.lay , r )
  return(stack(ll))
}


library(raster)
s1 <- create_stack(76,26,3805)
s2 <- create_stack(76,26,3805)
s3 <- create_stack(76,26,3805)
s4 <- create_stack(76,26,3805)


beginCluster()
out <- clusterR(s1, fun = function(x,s2,s3,s4) {return((x + s2 - s3 - s4)*1e3)}, 
                args = list(s2 = s2, s3 = s3, s4 = s4), progress = "text")
endCluster()

As one tip in general, I experienced that calling

beginCluster()

from the raster package at the beginning of a segment with many raster calculations can bring unknown benefits, since many rasterfunctions are already implemented for parallel computation.

edited Mar 01 '17 at 15:59

answered Feb 28 '17 at 13:58

loki

9,816
7
56
82

Thanks. I added a function to reproduce the rasterstacks of my dimensions. Unfortunately, I can't get your code running. The problem might be related to [this question](http://stackoverflow.com/questions/35369137/clusterr-with-multiple-raster-stacks)?! – moremo Mar 01 '17 at 09:21
Thanks, now your code is running. Have you stopped your running time, if you get a performance gain? I tested it with just 100 layers and got for my original approach theses measures: `user: 81.791 system: 0.68 elapsed: 84.40`. Using 4 nodes with your code I got `0.125 system: 0.018 elapsed: 95.32`. According to [this answer](http://stackoverflow.com/questions/18654497/mclapply-user-time-larger-than-elapsed-time) the `user` time should be the sum of all nodes, but where does then the longer `elapsed` time with your approach come from? – moremo Mar 01 '17 at 15:14
According to [here](http://stackoverflow.com/questions/13688840/what-caused-my-elapsed-time-much-longer-than-user-time) reading and writing to disk (raster tmpdir) could be the problem. – moremo Mar 01 '17 at 15:19
I checked the `system.time()`. It turned out simple addition and subtratction of the raster objects performed the best. See the first two lines of the answer – loki Mar 01 '17 at 16:00
Thanks a lot for your effort, I marked your answer as (very) useful. I did not accept your answer, as this is were I stood already a couple of days ago. The approach might be the fastest we have found so far, but takes on my system really for ever. So there must be a better way! – moremo Mar 01 '17 at 16:33

How to speed up calculations using huge raster stacks?

2 Answers2