I need to multi-thread my R application as it takes 5 minutes to run and is only using 15% of the computers available CPU.
An example of a process which takes a while to run is calculating the mean of a very large raster stack containing n layers:
mean = cellStats(raster_layers[[n]], stat='sd', na.rm=TRUE)
Using the parallel library, I can create a new cluster and pass a function to it:
cl <- makeCluster(8, type = "SOCK")
parLapply(cl, raster_layers[[1]], mean_function)
stopCluster(cl)
where mean function is:
mean_function <- function(raster_object)
{
result = cellStats(raster_object, stat='mean', na.rm=TRUE)
return(result)
}
This method works fine except that it can't see the 'raster' package which is required to use cellStats. So it fails saying no function for cellStats. I have tried including the library within the function but this doesnt help.
The raster package comes with a cluster function, and it CAN see the function cellStats, however as far as I can tell, the cluster function must return a raster object and must be passed a single raster object which isn't flexible enough for me, I need to be able to pass a list of objects and return a numeric variable... which I can do with normal clustering using the parallel library if only it can see the raster package functions.
So, does anybody know how I can pass a package to a node with multi-threading in R? Or, how I can return a single value from the raster cluster function perhaps?