What are alternative ways to speed up the processing time for large NetCDF data in R instead of using the for-loop function?

Question

I have large monthly NetCDF data for many years where each .nc file has 8 layers. I want to calculate the median value of each month in each year using the first layer in each .nc file only.

I have done the for loop as:

library(raster)
library(ncdf4)
setwd("path")
# go to all sub-folders
sub <- list.dirs(full.names=FALSE, recursive=FALSE)
# read and create a matrix of file names
fn <- list.files(path=sub, recursive=TRUE, full.names=TRUE,  pattern="*.*.nc$")
fn.mat <- matrix(fn, nrow = 155)
# select nc. files
mid <- c(1, 14, seq(26, 155, 13))
# subset the matrix of file names
fn.mat.sub <- fn.mat[mid, ]
# create indices for stackApply function to calculate the median using the first layers
layers <- rep(1:8, 113)
# loop through the whole file name matrix
ls <- list() # create empty list to store the output 
for (i in 1:ncol(fn.mat.sub)) {
  for (ii in 1:nrow(fn.mat.sub)) {
    s <- stack(fn.mat.sub[ii, -i]) # exclude the month of the year that wants to calculate the median for cross-validation
    m <- stackApply(s, indices = layers, fun = median, na.rm = T)
    ls[[length(ls)+1]] <- m[[1]] 
  }
}

But the processing time is extremely low. I know R doesn’t like “for-loop” and the “apply” family function (such as apply, lapply, sapply, vapply, etc.) is often used instead. But I don’t know how to use it in this case. I believe there are some better ways to improve the processing time in this case.

Any ideas to speed up this processing? Thanks!

[Using `apply` family only sometimes increases performance, only under certain conditions.](https://stackoverflow.com/a/70023363/5784757). — user438383, Sep 02 '22 at 07:11

What are alternative ways to speed up the processing time for large NetCDF data in R instead of using the for-loop function?

0 Answers0