Using jags.parallel from within a function (R language Error in get(name, envir = envir) : object 'y' not found)

Question

Using jags.parallel from the command line or a script works fine. I can run this modified example from http://www.inside-r.org/packages/cran/R2jags/docs/jags just fine

# An example model file is given in:
  model.file <- system.file(package="R2jags", "model", "schools.txt")
#=================#
# initialization  #
#=================#

  # data
  J <- 8.0
  y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
  sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)

  jags.data <- list("y","sd","J")
  jags.params <- c("mu","sigma","theta")
  jags.inits <- function(){
    list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
  }


#===============================#
# RUN jags and postprocessing   #
#===============================#
#  jagsfit <- jags(data=jags.data, inits=jags.inits, jags.params, 
#    n.iter=5000, model.file=model.file)

  # Run jags parallely, no progress bar. R may be frozen for a while, 
  # Be patient. Currenlty update afterward does not run parallelly

  print("Running Parallel") 
  jagsfit <- jags.parallel(data=jags.data, inits=jags.inits, jags.params, 
    n.iter=5000, model.file=model.file)

However if I wrap it in a function

testparallel <- functions(out){
# An example model file is given in:
    .
    .
    .
jagsfit <- jags.parallel(data=jags.data, inits=jags.inits, jags.params, 
  n.iter=5000, model.file=model.file)
print(out)
return(jagsfit)
}

Then I get the error: Error in get(name, envir = envir) : object 'y' not found Based on what I found here I know that it is an issue with the environment exported to the cluster and I have fixed it by changing

J <- 8.0
y <- c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2)
sd <- c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6)

to

  assign("J",8.0,envir=globalenv()) 
  assign("y",c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2),envir=globalenv()) 
  assign("sd",c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6),envir=globalenv())

Is there a better way to get around this?

Thank you, Greg

P.S.

I am working on this code for someone else so I don't really want to changes things in the R2jags package to let me pass in the environment to export though I plan on suggesting it to the authors of the package.

score 5 · Accepted Answer · answered Feb 10 '15 at 23:02

So I have contacted the author of R2jags and he has added an addition argument to jags.parallel that lets you pass envir, which is then past onto clusterExport.

This works well except it allows clashes between the name of my data and variables in the jags.parallel function.

Philippe · Answer 2 · 2014-05-23T14:43:01.760

0

if you use intensively JAGS in parrallel I can suggest you to look the package rjags combined with the package dclone. I think dclone is realy powerfull because the syntaxe was exactly the same as rjags. I have never see your problem with this package.

If you want to use R2jags I think you need to pass your variables and your init function to the workers with the function:

clusterExport(cl, list("jags.data", "jags.params", "jags.inits"))

edited May 23 '14 at 14:43

answered May 23 '14 at 14:38

Philippe

194
1
12

The clusterExport call is inside of jags.parallel. The problem is that clusterExport automatically exports the global environment. Since I am calling jags.parallel from within another function the variables (jags.data, etc.) are in the function environment not the global environment. Thus the processors don't get the variables and throws the error I cite. clusterExport does have an optional argument for selecting which environment to export, but jags.parallel does not. I have suggested a change to the authors. I am working on this code for someone else I don't want to edit the package myself. – goryh May 28 '14 at 13:56
I saw dclone, but I am new to both jags and R and working on large amount of preexisting code, so I was looking for a way to parallelize the code requiring the smallest amount of changes. If you know of a good source for learning about rjags and how to use dclone I'll look into it more. Thank you for answering. – goryh May 28 '14 at 14:00
After checking the `jags.parallel` function I think they have not the argument you need to find a solution to your problem. But with the `dclone` package the `makeSOCKcluster` function are not in the `jag.parfit` function. So you can control the environment of your variable. The equivalent fonction to `jags.parallel` is `jags.parfit`. The main difference in the arguments is the `cl` (clusters created before to run the function). The manual of jags and the exemples of the `jags.parfit` function are relatively substantial I think. – Philippe May 28 '14 at 15:27

score 0 · Answer 3 · answered Feb 10 '15 at 02:32

Without changing the code of R2jags, you can still assign those data variables to the global environment in an easier way by using list2env.

Obviously, there is is a concern that those variable names could be overwritten in the global environment, but you probably can control for that.

Below is the same code as the example given in the original post except I put the data into a list and sent that list's data into the global environment using the list2env function. (Also I took out the unused "out" variable in the function.) This currently runs fine for me; you may have to add more chains and/or add more iterations to see the parallelism in action, though.

testparallel <- function(){

    library(R2jags)

    model.file <- system.file(package="R2jags", "model", "schools.txt")

    # Make a list of the data with named items.
    jags.data.v2 <- list(
        J=8.0, 
        y=c(28.4,7.9,-2.8,6.8,-0.6,0.6,18.0,12.2),
        sd=c(14.9,10.2,16.3,11.0,9.4,11.4,10.4,17.6) )

    # Store all that data explicitly in the globalenv() as
    # was previosly suggesting using the assign(...) function.
    # This will do that for you.
    # Now R2jags will have access to the data without you having 
    # to explicitly "assign" each to the globalenv.
    list2env( jags.data.v2, envir=globalenv() )

    jags.params <- c("mu","sigma","theta")
    jags.inits <- function(){
        list("mu"=rnorm(1),"sigma"=runif(1),"theta"=rnorm(J))
    }

    jagsfit <- jags.parallel(
        data=names(jags.data.v2), 
        inits=jags.inits, 
        jags.params, 
        n.iter=5000, 
        model.file=model.file)

    return(jagsfit)
}

Thank you. My problem was that I didn't like having to make the variables global, but it is nice to there is a better way to make them global. — goryh, Feb 10 '15 at 22:56

M.L. · Answer 4 · 2023-05-25T10:59:33.790

This is esentially what r2jags is doing, but rewritten so you can put those variables into the environment by hand in clusterExport(), which loads variables into the blank R session set up separately for each cluster:

jinits <- function(){list(.RNG.name = 'lecuyer::RngStream',
                            .RNG.seed = round(1e+06*runif(1)))}
cl <- makeCluster(mcmc_params$nchains,methods=F, type="PSOCK")
JAGSmod <- function(seed){
      set.seed(seed) #note this affects jinits, but not rjags itself
      jags_mod <- jags.model(mod_path,datastruct,inits=jinits,n.adapt=mcmc_params$nadapt) 
      update(jags_mod, n.iter=mcmc_params$nburnin)
      mod_samp <- coda.samples(jags_mod, monitorparams, n.iter=mcmc_params$nsamples, thin=mcmc_params$nthin)
      return(mod_samp)
    }
clusterExport(cl,c('mod_path','datastruct','mcmc_params','monitorparams','jinits'),envir=environment()) ## could just do 'JAGSmod','jags.model','coda.samples','load.module' here instead but:
clusterEvalQ(cl, {
      library(rjags)
      load.module('lecuyer')
      load.module('glm')
    }) #runs code in each blank cl instance of R
res <- parLapply(cl,1:mcmc_params$nchains,JAGSmod)
stopCluster(cl)
l_mcmc <- as.mcmc.list(lapply(res,as.mcmc))
parsum <- summary(window(l_mcmc))

This can easily be wrapped in a function, although might require you to pass in c(datastruct,mcmc_params,monitorparams).

Using jags.parallel from within a function (R language Error in get(name, envir = envir) : object 'y' not found)

4 Answers4

Linked