6

I wrote a function in which I define variables and load objects. Here's a simplified version:

fn1 <- function(x) {
  load("data.RData") # a vector named "data"
  source("myFunctions.R")
  library(raster)
  library(rgdal)

  a <- 1
  b <- 2
  r1 <- raster(ncol = 10, nrow = 10)
  r1 <- init(r1, fun = runif)
  r2 <- r1 * 100
  names(r1) <- "raster1"
  names(r2) <- "raster2"
  m <- stack(r1, r2) # basically, a list of two rasters in which it is possible to access a raster by its name, like this: m[["raster1"]]

  c <- fn2(m)
}

Function "fn2" is can be found in "myFunctions.R" and is defined as:

fn2 <- function(x) {
  fn3 <- function(y) {
   x[[y]] * 100 * data
  }

  cl <- makeSOCKcluster(8)   
  clusterExport(cl, list("x"), envir = environment()) 
  clusterExport(cl, list("a", "b", "data")) 
  clusterEvalQ(cl, c(library(raster), library(rgdal), rasterOptions(maxmemory = a, chunksize = b))) 
  f <- parLapply(cl, names(x), fn3)  
  stopCluster(cl)
}

Now, when I run fn1, I get an error like this:

Error in get(name, envir = envir) : object 'a' not found

From what I understand from ?clusterExport, the default value for envir is .GlobalEnv, so I would assume that "a" and "b" would be accessible to fn2. However, it doesn't seem to be the case. How can I access the environment to which "a" and "b" belong?

So far, the only solution I have found is to pass "a" and "b" as arguments to fn2. Is there a way to use these two variables in fn2 without passing them as arguments?

Thanks a lot for your help.

Guilôme
  • 177
  • 2
  • 11

1 Answers1

7

You're getting the error when calling clusterExport(cl, list("a", "b", "data")) because clusterExport is trying to find the variables in .GlobalEnv, but fn1 isn't setting them in .GlobalEnv but in its own local environment.

An alternative is to pass the local environment of fn1 to fn2, and specify that environment to clusterExport. The call to fn2 would be:

c <- fn2(m, environment())

If the arguments to fn2 are function(x, env), then the call to clusterExport would be:

clusterExport(cl, list("a", "b", "data"), envir = env)

Since environments are passed by reference, there should be no performance problem doing this.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • Thanks for making this clearer, @Steve Weston. I agree with you its a bit messy. I gave it more thinking and I can load the data within the function where it is used. As for the cluster calls, I will make the cluster object (cl) and call clusterEvalQ from my main function, fn1, and then I will pass cl as an argument to functions that need to do some clusterExport() before calling parallel functions like parLapply. – Guilôme Nov 26 '13 at 14:47