I am trying to read NetCDF's and then do some raster processing and convert them to a specific format using R doParallel package. I provide here only a short reproducible version using data from the raster package. This piece of code runs perfectly on my PC.
library(raster)
library(sp)
library(doParallel)
library(foreach)
## specify the variables to download
vars = c('tmin', 'tmax', 'prec')
## specify the workdir
workdir <- getwd()
cl <- makeCluster(4)
registerDoParallel( cl)
start_time <- Sys.time()
##initiate the foreach loop
foreach(var = vars,.packages=c('raster','doParallel','foreach','base','naturalsort','ncdf4')) %dopar% {
file <- getData("worldclim",var=var,res=10, lon=5, lat=45)
##initiate another foreach loop
foreach(i = 1:12,.packages=c('raster','doParallel','foreach','base','naturalsort','ncdf4')) %dopar% {
##read the raster
ras<-file[[i]]
## then I write the raster as tiff
writeRaster(ras, paste(workdir,paste0(gsub(".nc","",var),"tempfile.tif"),sep=""), format="GTiff", overwrite=TRUE)
## create the output file name
outfile = paste(workdir,"\\",gsub(".nc","",var),"_",i,".map",sep="")
## use gdal to translate the file format
system(paste('gdal_translate -of PCRaster ',workdir,paste0(gsub(".nc","",var),"tempfile.tif")," ",outfile,sep=""))
}
}
stopCluster(cl)
end_time <- Sys.time()
end_time - start_time
I have AMD Ryzen Threadripper 2990WX 32-Core, 64-Thread processor and 128 GB RAM.
However, If I use a large number of raster (10,000) as in my practical case, then the same piece of code fails with the error
Error in unserialize(socklist[[n]]) : error reading from connection
The R code runs for some time steps and generates some files. The front end of R studio will stop and gives me the above error. But in the background, the files are actually generating but not for all the variables. For instance, I created 4 clusters, 2 stopped after a while and other 2 are generating the files even though the front end RStudio gives an error message. While doing this, sometimes my PC crashes and restarts automatically.
I also check the log file to see if something is wrong as mentioned here and here. I get following log message when I use makeCluster(4,output ="log.txt")
starting worker pid=7276 on localhost:11852 at 12:34:27.693
starting worker pid=1396 on localhost:11852 at 12:34:27.846
starting worker pid=8060 on localhost:11852 at 12:34:27.990
starting worker pid=13648 on localhost:11852 at 12:34:28.140
Loading required package: sp
Loading required package: sp
Loading required package: sp
Loading required package: sp
Loading required package: foreach
Loading required package: foreach
Loading required package: iterators
Loading required package: iterators
Loading required package: foreach
Loading required package: foreach
Loading required package: parallel
Loading required package: parallel
Loading required package: iterators
Loading required package: iterators
Loading required package: parallel
Loading required package: parallel
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
In addition: Warning messages:
1: package 'raster' was built under R version 3.4.4
2: package 'sp' was built under R version 3.4.4
3: package 'naturalsort' was built under R version 3.4.4
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
In addition: Warning messages:
1: package 'raster' was built under R version 3.4.4
2: package 'sp' was built under R version 3.4.4
3: package 'naturalsort' was built under R version 3.4.4
Execution halted
If I use the same piece of code with the similar version of R, the packages and RStudio on another windows machine (Intel(R) Core(TM) i7-6700 CPU @ 3.4GHz) the code works perfectly.
Is this something related to AMD Ryzen processors?