-2

I need to do calculations on a lot of big rasters(28000 cells, 181 layers). I tried my code on a small subset (24cells, 181 layers). I took help from this forum to optimize as much as I could.

Now I used bricks in raster package because I read that bricks are loaded into memory and are faster to process. But then some suggest that terra is better and easier than raster.

I used both packages to run my code and I see that raster is quicker in my case (9min compared with 16min). Now this time is for a small dataset. When I run it on original data the computer takes forever. When I check CPU and RAM use of computer during the process its around 16% and about 1GB respectively. Even if my code is inefficient, why isn't R using available RAM and CPU.

What I'm trying to do is implementing a model where I have to interpolate Landsat NDVI to daily timestep using spline. Then calculate different variables using mathematical equations. Some are really simple and straightforward, but some are really complex.

What I want is an efficient way of calculating different variables.
Would really appreciate if someone can explain:

  1. Even if my code is inefficient what I believe is the computer should use maximum available resources. And is there a way to do that?
  2. Is raster package better in my case?
  3. I see some are recommending parallel processing. As I'm not a programmer so I'll have to read about it before implementing. I do know what it is though.

Apologies for not presenting the queries as they should be 1st time. Hope this is in better shape now. Thanks

Here is reproducible terra code (special thanks to Robert Hijmans):

library(terra)
b <- r1 <- r2 <- rast(ncols=5, nrows=5, nl=5, vals=NA)
set.seed(0)
values(b) <- runif(size(b))
b[c(1,2,3,22,23,24,25)] <- NA

p  <- 0.15
p1 <- p/3
p2 <- p-(p/3)
fc <- 0.3
weather <- c(0.1, 0, 0, 0, 0.3)

r2[[1]] <- ifel(is.na(b[[1]]), NA, 0.3)

for (i in 1:nlyr(b)) {
    varr1 <- b[[i]] * (((r2[[i]] - p1)/p2)^2)
    r1[[i]] <- ifel(r2[[i]] > p, b[[i]], varr1)
    for (k in 2:nlyr(b)) {
        r2[[k]] <- min(r2[[k-1]] + (weather[k-1] - r1[[k-1]]) /100, fc)
    }
}

Here is reproducible raster code:

library(raster)
b <- brick(ncols=5, nrows=5, nl=5)
inBrick <- setValues(b, runif(ncell(b) * nlayers(b)))
inBrick[c(1,2,3,22,23,24,25)] <- NA
outBrick1 <- inBrick
outBrick1[] <- NA
outBrick2 <- outBrick1

ini <- 0.3
p <- 0.15
p1 <- p/3
p2 <- p-(p/3)
fc <- 0.3
var1 <- which(!is.na(inBrick[[1]][]))

outBrick2[[1]][var1] <- ini
### now outBrick2 has initial values in 1st layer
weather <- c(0.1, 0, 0, 0, 0.3)
var3 <- 1:ncell(inBrick)
### outBrick1 Calculations
for (i in 1:nlayers(inBrick)) {
varr1 <- inBrick[[i]][]*(((outBrick2[[i]][]-p1)/(p2))^2)
for (j in 1:ncell(inBrick)) {
if(!is.na(outBrick2[[i]][j])){
  if(outBrick2[[i]][j]>p){
     outBrick1[[i]][j] <- inBrick[[i]][j]
  }else{
     outBrick1[[i]][j] <- varr1[j]
    }
  }
}
###outBrick2 Calculations
for (k in 2:nlayers(inBrick)) {
var2 <- outBrick2[[k-1]][] + (weather[k-1]-outBrick1[[k-1]][])/100
 for(l in 1:ncell(inBrick)){
  var3[l] <- min(fc, var2[l])
 }
 outBrick2[[k]][] <- var3
 }
}

2 Answers2

3

"terra" is generally faster, sometimes much faster. nukubiho points out that with arithmetic computations "raster" may be faster than "terra". However, the difference is generally small, and may only be true when the cell values are in memory.

However, this may not hold for large datasets (where these differences may actually matter). The cell values for these datasets are typically in a file.

In memory:

library("raster")
library("terra")

n = 12000
x_terra = rast(nrows = n, ncols = n, vals = rnorm(n ^ 2))
y_terra = rast(nrows = n, ncols = n, vals = rnorm(n ^ 2))

x_raster = raster(x_terra)
y_raster = raster(y_terra)
r <- c(x_terra, y_terra)

system.time({ (x_raster - y_raster) / (x_raster + y_raster) })
#   user  system elapsed 
#   1.83    0.36    2.19 
system.time({ (x_terra - y_terra) / (x_terra + y_terra) })
#   user  system elapsed 
#   2.66    2.25    4.91 
system.time(app(r, \(x) (x[,1]-x[,2]) / (x[,1] + x[,2])))
#   user  system elapsed 
#   4.06    2.17    6.25 

"raster" is faster. But when the values are in files, "terra" is faster.

x_terra = writeRaster(x_terra, "test1.tif", overwrite=T)
y_terra = writeRaster(y_terra, "test2.tif", overwrite=T)
x_raster = raster(x_terra)
y_raster = raster(y_terra)
r <- c(x_terra, y_terra)

system.time({ (x_raster - y_raster) / (x_raster + y_raster) })
#   user  system elapsed 
#  17.25    5.39   22.66 
system.time({ (x_terra - y_terra) / (x_terra + y_terra) })
#   user  system elapsed 
#  13.63    4.92   18.61 
system.time(app(r, \(x) (x[,1]-x[,2]) / (x[,1] + x[,2])))
#   user  system elapsed 
#   9.78    3.46   13.24 
Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
1

Generally, terra is faster than raster, but one exception is in-memory arithmetic calculations. Probably the reason is that raster uses heavily optimized R code. See this simple example:

library("raster")
library("terra")

n = 12000
x_terra = rast(nrows = n, ncols = n, vals = rnorm(n ^ 2))
y_terra = rast(nrows = n, ncols = n, vals = rnorm(n ^ 2))
x_raster = raster(x_terra)
y_raster = raster(y_terra)

## terra
system.time({ (x_terra - y_terra) / (x_terra + y_terra) })
#> user  system elapsed
#> 2.63    2.22    4.86

## raster
system.time({ (x_raster - y_raster) / (x_raster + y_raster) })
#> user  system elapsed
#> 1.78    0.26    2.05

If you want to use more CPU and RAM, then maybe you should split data into blocks and process them in parallel? See these questions:

  1. terra package returns error when try to run parallel operations
  2. Process sets of rasters in parallel using lapp function from terra package
nukubiho
  • 313
  • 1
  • 8