0

I am trying to read NetCDF's and then do some raster processing and convert them to a specific format using R doParallel package. I provide here only a short reproducible version using data from the raster package. This piece of code runs perfectly on my PC.

library(raster)
library(sp)
library(doParallel)
library(foreach)

## specify the variables to download
vars = c('tmin', 'tmax', 'prec')
## specify the workdir
workdir <- getwd()

cl <- makeCluster(4)
registerDoParallel( cl)
start_time <- Sys.time()
##initiate the foreach loop
foreach(var = vars,.packages=c('raster','doParallel','foreach','base','naturalsort','ncdf4')) %dopar% {

  file <- getData("worldclim",var=var,res=10, lon=5, lat=45)

  ##initiate another foreach loop
  foreach(i = 1:12,.packages=c('raster','doParallel','foreach','base','naturalsort','ncdf4')) %dopar% {
    ##read the raster
    ras<-file[[i]]
    ## then I write the raster as tiff
    writeRaster(ras, paste(workdir,paste0(gsub(".nc","",var),"tempfile.tif"),sep=""), format="GTiff", overwrite=TRUE)
    ## create the output file name 
    outfile = paste(workdir,"\\",gsub(".nc","",var),"_",i,".map",sep="") 
    ## use gdal to translate the file format
    system(paste('gdal_translate -of PCRaster ',workdir,paste0(gsub(".nc","",var),"tempfile.tif")," ",outfile,sep=""))

  }
}
stopCluster(cl)
end_time <- Sys.time()
end_time - start_time

I have AMD Ryzen Threadripper 2990WX 32-Core, 64-Thread processor and 128 GB RAM.

However, If I use a large number of raster (10,000) as in my practical case, then the same piece of code fails with the error

Error in unserialize(socklist[[n]]) : error reading from connection

The R code runs for some time steps and generates some files. The front end of R studio will stop and gives me the above error. But in the background, the files are actually generating but not for all the variables. For instance, I created 4 clusters, 2 stopped after a while and other 2 are generating the files even though the front end RStudio gives an error message. While doing this, sometimes my PC crashes and restarts automatically.

I also check the log file to see if something is wrong as mentioned here and here. I get following log message when I use makeCluster(4,output ="log.txt")

starting worker pid=7276 on localhost:11852 at 12:34:27.693
starting worker pid=1396 on localhost:11852 at 12:34:27.846
starting worker pid=8060 on localhost:11852 at 12:34:27.990
starting worker pid=13648 on localhost:11852 at 12:34:28.140
Loading required package: sp
Loading required package: sp
Loading required package: sp
Loading required package: sp
Loading required package: foreach
Loading required package: foreach
Loading required package: iterators
Loading required package: iterators
Loading required package: foreach
Loading required package: foreach
Loading required package: parallel
Loading required package: parallel
Loading required package: iterators
Loading required package: iterators
Loading required package: parallel
Loading required package: parallel
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
In addition: Warning messages:
1: package 'raster' was built under R version 3.4.4 
2: package 'sp' was built under R version 3.4.4 
3: package 'naturalsort' was built under R version 3.4.4 
Execution halted
Error in unserialize(node$con) : error reading from connection
Calls: <Anonymous> ... doTryCatch -> recvData -> recvData.SOCKnode -> unserialize
In addition: Warning messages:
1: package 'raster' was built under R version 3.4.4 
2: package 'sp' was built under R version 3.4.4 
3: package 'naturalsort' was built under R version 3.4.4 
Execution halted 

If I use the same piece of code with the similar version of R, the packages and RStudio on another windows machine (Intel(R) Core(TM) i7-6700 CPU @ 3.4GHz) the code works perfectly.

Is this something related to AMD Ryzen processors?

user3978632
  • 283
  • 4
  • 17
  • 1
    While I can't answer your specific question, I'd like to point out that *if* you want the inner `foreach` loop to actually take advantage of parallelism, you need to nest with `%:%` as described in the [vignette](https://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf). In some cases it is better to have only one level of parallelism, but in that case you could use `%do%` in the inner code. – Alexis Aug 09 '19 at 22:14
  • Thanks, @Alexis. Though it did not answer my question, I found the vignette quite helpful. – user3978632 Aug 12 '19 at 14:25
  • I've received this error when my machine runs out of memory when performing parallel applications. Does the `other windows machine` have more RAM? – CPak Aug 12 '19 at 20:34
  • @CPak The other windows machine has even less RAM (64 GB) than this one (128 GB) where the code crashes. Its not the issue of RAM as I have more than sufficient RAM. I checked the task manager and hardly 15% RAM is utiliized. – user3978632 Aug 12 '19 at 20:38

0 Answers0