I have large monthly NetCDF data for many years where each .nc file has 8 layers. I want to calculate the median value of each month in each year using the first layer in each .nc file only.
I have done the for loop as:
library(raster)
library(ncdf4)
setwd("path")
# go to all sub-folders
sub <- list.dirs(full.names=FALSE, recursive=FALSE)
# read and create a matrix of file names
fn <- list.files(path=sub, recursive=TRUE, full.names=TRUE, pattern="*.*.nc$")
fn.mat <- matrix(fn, nrow = 155)
# select nc. files
mid <- c(1, 14, seq(26, 155, 13))
# subset the matrix of file names
fn.mat.sub <- fn.mat[mid, ]
# create indices for stackApply function to calculate the median using the first layers
layers <- rep(1:8, 113)
# loop through the whole file name matrix
ls <- list() # create empty list to store the output
for (i in 1:ncol(fn.mat.sub)) {
for (ii in 1:nrow(fn.mat.sub)) {
s <- stack(fn.mat.sub[ii, -i]) # exclude the month of the year that wants to calculate the median for cross-validation
m <- stackApply(s, indices = layers, fun = median, na.rm = T)
ls[[length(ls)+1]] <- m[[1]]
}
}
But the processing time is extremely low. I know R doesn’t like “for-loop” and the “apply” family function (such as apply, lapply, sapply, vapply, etc.) is often used instead. But I don’t know how to use it in this case. I believe there are some better ways to improve the processing time in this case.
Any ideas to speed up this processing? Thanks!