I'm using doMPI in R to parallelize saving netCDF climate data. This data is stored in R in a 4-dimensional matrix m
, with data for 6 variables, at 20000 timepoints, over a latitude and longitude grid. m
is thus indexed as m[lon,lat,time,variable]
. Based on how netCDF stores its data on disk, the most efficient way to write data to disk is by timeslice. Consequently, I'd like to iterate over m
one timeslice at a time, for each variable. Currently, my code looks like this:
ntime <- 20000
output.vars <- list("rainfall", "snowfallwateq", "snowmelt", "newsnow", "snowdepth", "swe")
for (var.index in seq_along(output.vars)) {
ncout <- nc_open(output.files[var.index], write=TRUE)
val <- foreach(time.index=1:ntime, .packages=c("ncdf4")) %dopar%
{
ncvar_put(ncout, output.vars[[var.index]],
vals=m[,,time.index,var.index],
start=c(1, 1, time.index),
count=c(nlon, nlat, 1))
}
nc_close(ncout)
}
This unnecessarily copies the entire m
matrix to each worker. That takes up a prohibitively large amount of memory, and I need to reduce the amount of copied data. What I thought of from this answer was that I could iterate over each timeslice of the matrix, so only the data for the timeslice was copied to each worker at each iteration. The foreach
construct allows multiple objects to iterate simultaneously, so I could even have the time index alongside the matrix timeslice without a problem. Unfortunately, I don't know of any way to iterate over a matrix by-timeslice. Is there a way to do so, such that at each iteration t
of the foreach
loop for variable var
, I can have a variable data
which holds the 2-dimensional matrix m[,,t,var]
?
I've already tried the intuitive approach below, but it iterates over each individual element, rather than an entire timeslice at a time.
val <- foreach(time.index=1:ntime, slice=m[,,,var], ...