6

I would like to run jags models in parallel on my windows computer with 4 cores, but have not been able to figure out why my model will not run. I have searched the web extensively including these posts:

http://andrewgelman.com/2011/07/23/parallel-jags-rngs/

http://users.soe.ucsc.edu/~draper/eBay-Google-2013-parallel-rjags-example.txt

When I run a simple example (see code below) with %do%, the model runs fine (serially of course). When I use %dopar%, I receive the error: Error in { : task 1 failed - "Symbol table is empty"

library(rjags)
library(coda)
library(foreach)              
library(doParallel)
library(random)     
load.module("lecuyer")

###  Data generation 
y <- rnorm(100)
n <- length(y)
win.data <- list(y=y, n=n)

# Define model
sink("model.txt")
cat("
    model {
    # Priors
    mu ~ dnorm(0, 0.001)
    tau <- 1 / (sigma * sigma)
    sigma ~ dunif(0, 10)
    # Likelihood
    for (i in 1:n) {
      y[i] ~ dnorm(mu, tau)
    }
}
",fill=TRUE)
sink()

inits <- function(){ list(mu=rnorm(1), sigma=runif(1, 0, 10),
                 .RNG.name = "lecuyer::RngStream", 
                 .RNG.seed = as.numeric(randomNumbers( n = 1, min = 1, max = 1e+06, col = 1 )) ) }
params <- c('mu','sigma')  

cl <- makePSOCKcluster(3)              
clusterSetRNGStream(cl)
registerDoParallel(cl)      
model.wd <- paste(getwd(), '/model.txt', sep='')     # I wondered if the cores were having trouble finding the model.         

m <- foreach(i=1:3, .packages=c('rjags','random','coda'), .multicombine=TRUE) %dopar% {  
                load.module( "lecuyer" )  
                model.jags <- jags.model(model.wd, win.data, inits=inits, n.chains=1, n.adapt=1000, quiet=TRUE)
                result <- coda.samples(model.jags, params, 1000, thin=5)
                return(result)
              }            
stopCluster(cl)
# Error in { : task 1 failed - "Symbol table is empty


sessionInfo()
# R version 3.0.1 (2013-05-16)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# 
# locale:
#   [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
# [4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    
# 
# attached base packages:
#   [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#   [1] random_0.2.1     doParallel_1.0.3 iterators_1.0.6  foreach_1.4.1    rjags_3-10       coda_0.16-1     
# [7] lattice_0.20-21 
# 
# loaded via a namespace (and not attached):
#   [1] codetools_0.2-8 compiler_3.0.1  grid_3.0.1      tools_3.0.1  

More Details:

The problem occurs on a Windows 7 computer with NO admin privaleges, but not on a computer WITH admin privaleges. The problem occurs with Rgui and Rterm and with the new rjags packaged 3-11. The error message occurs within the function jags.model

The problem appears to stem from a mismatch in writing and reading files to a temporary directory. When I start R, it automatically creates a temporary folder. When I close R, this folder is automatically deleted, unless it contains files.

For example, when I start R it creates this folder: C:\Users\jesse whittington\AppData\Local\Temp\RtmpoBe1gw.

When I run a rjags model with

m <- jags.model(file='model.txt', data=win.data, inits=inits, n.chains=3, n.adapt=1000, quiet=FALSE)

No files are written to this temporary directory.

When I run 3 chains serially with foreach and %do%, 3 temporary files are written to this folder. These files are 1 kb in size and when I open with a text editor they appear blank.

wd <- getwd()                   
cl <- makePSOCKcluster(3, outfile=paste(wd,'/Out_messages.txt', sep=''))   # 3 chains           
clusterSetRNGStream(cl)
registerDoParallel(cl)   
m <- foreach(i=1:3, .packages=c('rjags','random','coda'), .multicombine=TRUE) %do% {  
                load.module( "lecuyer" ) 
                result <- jags.model(file='model.txt', data=win.data, inits=inits, n.chains=1, n.adapt=1000, quiet=FALSE)
                return(result)
              }  
stopCluster(cl) 

When I run 3 chains in parallel with foreach and %dopar%, 3 temporary files are written to the folder ..Temp\RtmpoBe1gw. The error messages in the outfile suggest that the function is looking for DIFFERENT files in DIFFERENT temporary directories. When, I include a line to create a tempfile directory and name, I see that 3 new temporary folders are created (they are later deleted with stopCluster). jags.model looks in these 3 folders for the temporary files and fails because there is nothing in them. Thus, I suspect tempfiles are written to one temporary directory (associated with the parent R session) and then fails when trying to open different tmpfiles in the 3 temporary directories created within foreach.

wd <- getwd()                   
cl <- makePSOCKcluster(3, outfile=paste(wd,'/Out_messages.txt', sep=''))   # 3 chains           
clusterSetRNGStream(cl)
registerDoParallel(cl)   
m <- foreach(i=1:3, .packages=c('rjags','random','coda'), .multicombine=TRUE) %dopar% {  
                load.module( "lecuyer" ) 
        tmp <- tempfile()
                print(tmp)
                result <- jags.model(file='model.txt', data=win.data, inits=inits, n.chains=1, n.adapt=1000, quiet=FALSE)
                return(result)
              }  
stopCluster(cl) 

From Out_messages.txt

starting worker pid=4396 on localhost:11109 at 08:34:06.430
starting worker pid=6548 on localhost:11109 at 08:34:06.879
starting worker pid=6212 on localhost:11109 at 08:34:07.418
Loading required package: coda
Loading required package: lattice
Loading required package: coda
Loading required package: lattice
Loading required package: coda
Loading required package: lattice
Linked to JAGS 3.3.0
Loaded modules: basemod,bugs
Linked to JAGS 3.3.0
Loaded modules: basemod,bugs
Linked to JAGS 3.3.0
Loaded modules: basemod,bugs
module lecuyer loaded
module lecuyer loaded
module lecuyer loaded
[1] "C:\\Users\\JESSEW~1\\AppData\\Local\\Temp\\RtmpQbPAVC\\file112c8077a0"  # Note this is from: tmp <- tempfile()
[1] "C:\\Users\\JESSEW~1\\AppData\\Local\\Temp\\RtmpMPMpcY\\file199489564c6"
[1] "C:\\Users\\JESSEW~1\\AppData\\Local\\Temp\\Rtmpk9vMR5\\file18445f6b2fd4"
Compiling model graph
Compiling model graph
Compiling model graph

Warning messages:
1: In jags.model(file = "model.txt", data = win.data, inits = inits,  :
  Unused variable "y" in data
2: In jags.model(file = "model.txt", data = win.data, inits = inits,  :
  Unused variable "n" in data
3: In jags.model(file = "model.txt", data = win.data, inits = inits,  :
  Failed to open file C:\Users\JESSEW~1\AppData\Local\Temp\RtmpQbPAVC\file112c394b4eef
Nothing to compile

4: In jags.model(file = "model.txt", data = win.data, inits = inits,  :
  Unused initial value for "mu" in chain 1
5: In jags.model(file = "model.txt", data = win.data, inits = inits,  :
  Unused initial value for "sigma" in chain 1
6: In jags.model(file = "model.txt", data = win.data, inits = inits,  :
  Can't initialize. No nodes in graph (Have you compiled the model?)

The folder RtmpQbPAVC is created but the file file112c394b4eef does not exist.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
Jesse
  • 121
  • 1
  • 6
  • I can't reproduce your error on Linux: your code seems to work fine on my machine. However, I noticed that you're not passing the cluster object `cl` to `registerDoParallel`. I don't think that is the cause of your error, but it does mean that the cluster that you will actually use won't be the one initialized by `clusterSetRNGStream`. – Steve Weston Aug 26 '13 at 14:04
  • Thanks for checking Steve and for pointing out that I did not pass cl to registerDoParallel. – Jesse Aug 27 '13 at 02:20
  • @Steve The script runs fine on a Windows 7 computer WITH admin privaleges. It continues to fail on my Windows 7 computer with NO admin privaleges. The script fails on the call to `jags.model`. The function detects errors in the model name, and invalid functions in inits, but not errors in win.data. – Jesse Aug 27 '13 at 03:00
  • 1
    I can't replicate the error in Windows Server 2003 without admin privileges. Can you confirm that the `jags.model` line works correctly when used outside of the `foreach` (serially)? – nograpes Aug 29 '13 at 21:45
  • @nograpes The jags.model does work fine serial and it works fine if I change `%dopar%` to `%do%`. I installed the JAGS program a while back in C:/Stats I think (if I remember correctly) because it wouldn't work when installed in Program Files (Windows 7). I wonder if the parallel processes are running in some directory where it cannot create files. For example, I cannot copy-paste any files into my Program Files directory because I lack admin privaleges. Thanks for testing! – Jesse Aug 30 '13 at 00:35
  • Could it have anything to do with the use of Rscript vs. R? Could the R executable have privileges in some sense that Rscript does not? (You can tell I'm Windows ignorant...) Could you try running the %do% version using Rscript to see if that causes it to fail? – Steve Weston Aug 30 '13 at 13:08
  • Given the way you're creating the cluster, the workers should be in the same directory as the master. You can verify that with `clusterEvalQ(cl, getwd())`. You could also try running a simple test to see if the workers can create files in their current directory. – Steve Weston Aug 30 '13 at 13:48
  • I removed the space from the directory name, that had no effect. Similarly, the problem persists using R and RStudio. You're correct that `clusterEvalQ(cl, getwd())` prints the current directory, as does `return(getwd())` within a foreach call. "C:/a_U_drive/R_scripts/Parallel_MCMC". – Jesse Aug 30 '13 at 19:46
  • The error message occurs during this line of code within the 'jags.model' function: 'model.data <- .Call("get_data", p, PACKAGE = "rjags")`. Several other compiled functions work fine before this line of code. – Jesse Aug 31 '13 at 12:27
  • I suspect something went wrong before the call to `get_data`, but it wasn't reported until then. You should make sure that you see any warning messages from the workers by setting `quiet=FALSE` when calling jags.model, and also use the makePSOCKcluster `outfile` option so that output isn't dumped. Unfortunately, `outfile=''` may not work if you're using an R GUI, which is my usual advice. – Steve Weston Aug 31 '13 at 14:18
  • There is a new rjags 3-11 on CRAN that was released very recently, and it looks like it has better error checking in the jags.model function. That might result in a more informative error message for your problem if you're lucky. – Steve Weston Aug 31 '13 at 14:24
  • I installed the rjags 3-11. Same problem. I worked with the outfile option you suggested and narrowed the problem a little. When I start Rgui or Rterm, R creates a temporary folder. jags.model writes a file (one per chain) into that temporary directory and then reads it again. When running parallel, jags.model writes a file to that directory, but then searches in a DIFFERENT non-existant temporary directory for that file. The main error is: Failed to open file `C:\Users\Jesse... Nothing to compile`. There's 5 other errors about unused variables & initial values. THANKS for your help! – Jesse Aug 31 '13 at 16:46
  • Could you put this information into the question with more detail? Either that, or put it into an answer if you've figured out the cause of the problem. I'm not sure if this a bug in the Windows version of R or in the rjags package or something else. – Steve Weston Aug 31 '13 at 21:45
  • This is interesting, but why didn't you try the `jags.parallel` or the `runjags` package? That could possibly be simpler than trying to do the parallel job yourself. – Tomas Jan 26 '14 at 19:27
  • 1
    @Tomas I ran into similar issues with jags.parallel - I couldn't get it to work even with simlpe models. runjags worked for some models but crashed on others. Those models took many days to run so I didn't spend a lot of time trying to figure out why. – Jesse Jan 27 '14 at 12:32
  • That's depressive! I don't want to programm it all myself, these basic packages should work! There are two posts on `jags.parallel`, maybe they can help you make it work http://stackoverflow.com/q/17808575/684229 and http://stackoverflow.com/q/16723036/684229. But I don't have an ultimate solution. Please let me know if you find a solution, I must do the same. – Tomas Jan 27 '14 at 13:13
  • @Tomas I tried my foreach scripts this morning and now they work. No idea why - our network updates must have changed our read-write access. Sorry to hear you have the same frustrations! – Jesse Jan 27 '14 at 18:42

3 Answers3

1

I have identified the source of the problem. I can write and read files to and from a temporary directory when using R normally. When in parallel, I can write files to the temporary directories, but I do NOT have permission to read files.

The problem occurs both writing and reading text files (using writeLines and readLines) and csv files.

I have since found that if I receive this message: "Error in { : task 1 failed - cannot open the connection", I can rectify the problem by deleting all temporary files in TEMP. For some locked files, I have to shut down and restart the computer before I am able to delete the necessary files. Even so, within the same R session I might receive the error message and then be able to successful run the program on my next try. The problem likely stems from our government anti-virus software and/or the structure of our remote network access.

Here is an example that writes and reads text files for simplicity.

library(foreach)              
library(doParallel)
wd <- getwd()
data <- data.frame(x=1:10, y=1:10)

This works fine.

modfile <- tempfile()
print(modfile)
# "C:\\Users\\JESSEW~1\\AppData\\Local\\Temp\\RtmpsvYfFk\\filef38a272022"
write.csv(data, modfile, row.names=F)
m <- read.csv(modfile) 

This does not work

cl <- makePSOCKcluster(3, outfile=paste(wd,'/Out_messages.txt', sep=''))   # 3 chains           
clusterSetRNGStream(cl)
registerDoParallel(cl)   
m <- foreach(i=1:3) %dopar% {  
  modfile <- tempfile()
  write.csv(data, modfile, row.names=F)
  x <- read.csv(modfile)
  return(x)
}  
# Error in { : task 1 failed - "cannot open the connection"
stopCluster(cl) 

Here is the output from Out_message.txt. Note the "Permission Denied" on the far right.

starting worker pid=6852 on localhost:11611 at 22:09:19.488
starting worker pid=6984 on localhost:11611 at 22:09:19.926
starting worker pid=3384 on localhost:11611 at 22:09:20.441
Warning message:
  Warning message:
  In file(con, "r") :
  cannot open file 'C:\Users\JESSEW~1\AppData\Local\Temp\Rtmp6dEZLP\file1ac44a506032': Permission denied
In file(con, "r") :
  cannot open file 'C:\Users\JESSEW~1\AppData\Local\Temp\RtmpuydRvR\file1b48185f2a2d': Permission denied
Warning message:
  In file(con, "r") :
  cannot open file 'C:\Users\JESSEW~1\AppData\Local\Temp\RtmpAbOIng\filed382ef37d51': Permission denied
Jesse
  • 121
  • 1
  • 6
  • This is great progress, but still leaves the question of why the use of parallel/doParallel/foreach creates this strange situation. I'm still wondering if it has anything to do with Rscript. If you isolated the body of the foreach loop into a script, does it fail with Rscript and succeed with 'R -f'? It would be great if it was that simple. – Steve Weston Sep 02 '13 at 13:29
  • It may be helpful to see a listing of the permissions on the temp directories using the Windows "icacls" command. Get the temp directory names with `tdirs <- clusterEvalQ(cl, tempdir())` and then execute "icacls" from within R using `system(paste("icacls", tdirs[1]), input="")`, for example. That may shed light on why the master can read and write files in its tempdir, while the cluster workers can only write files in their own tempdirs. – Steve Weston Sep 02 '13 at 14:01
  • The problem occurs when I use Rgui.exe and R.exe. Printed permissions are the same for Rgui.exe & R.exe and also for master tempdir and cluster tempdir. Printed permissions are: `C:\Users\JESSEW~1\AppData\Local\Temp\RtmpqeRGYq NT AUTHORITY\SYSTEM:(I)(OI)(CI)(F)` `BUILTIN\Administrators:(I)(OI)(CI)(F)` `APCA2\Jesse Whittington:(I)(OI)(CI)(F)` – Jesse Sep 02 '13 at 15:25
  • Note that the folder C:\Users\jesse whittington\ has a padlock icon beside it. I'm (unfortunately) using a government laptop that likely has additional restriction on permissions. The problem occurs when I'm both logged into the network and when I'm off the network. I am able to manually copy, paste, and open temporary files within the cluster tempdir. – Jesse Sep 02 '13 at 16:01
  • Can you execute: `Rscript -e 'x=tempfile();writeLines(letters,x);readLines(x)'` – Steve Weston Sep 02 '13 at 17:50
  • On Windows, I think you need to use double-quotes around the expression following -e, so execute this instead: `Rscript -e "x=tempfile();writeLines(letters,x);readLines(x)"`. – Steve Weston Sep 02 '13 at 18:55
  • OK, I see what you mean by Rscript. This works `C:\Stats\R\R-3.0.1\bin\Rscript.exe -e 'x=tempfile();writeLines(letters,x);readLines(x)'`. Using Rscript and foreach does not work with %do% or %dopar%. `Rscript -e 'library(foreach);library(doParallel);cl <- makePSOCKcluster(3);registerDoParallel(cl);m <- foreach(i=1:3) %dopar% {x=tempfile();writeLines(letters,x);readLines(x)};stopCluster(cl)'`. The messages says: "The system cannot find the file specified". One thing that could influence my problems is that our IT department sets up our network drive (U:/) as our default drive. – Jesse Sep 02 '13 at 19:16
  • I see the double-quote message now. The `Rscript -e "x=tempfile();writeLines(letters,x);readLines(x)"` says permission denied. – Jesse Sep 02 '13 at 21:07
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/36724/discussion-between-steve-weston-and-jesse) – Steve Weston Sep 03 '13 at 12:43
1

Steve brought this to my attention, but your second example shows that it is not a problem with rjags. I am unable to reproduce the bug in either example using the same setup (Windows 7, R 3.0.1, JAGS 3.0.3, ordinary user without admin access).

Martyn Plummer
  • 331
  • 1
  • 1
  • I agree with you. I'm guessing that it's a Windows security/configuration issue, but I'm very curious why it only happens when running in parallel. Thanks for looking into the issue. – Steve Weston Sep 02 '13 at 16:59
1

Since the errors are caused by writing and reading the model file, I suggest that you bypass that issue by using the "textConnection" function. This can be used to create a file-like object without creating an actual file, thus avoiding the need for temporary files. I modified your example to demonstrate this:

library(rjags)
library(doParallel)
library(random)
load.module("lecuyer")
y <- rnorm(100)
n <- length(y)
win.data <- list(y=y, n=n)
model <- "
  model {
    # Priors
    mu ~ dnorm(0, 0.001)
    tau <- 1 / (sigma * sigma)
    sigma ~ dunif(0, 10)
    # Likelihood
    for (i in 1:n) {
      y[i] ~ dnorm(mu, tau)
    }
  }"
inits <- function() {
  list(mu=rnorm(1), sigma=runif(1, 0, 10),
       .RNG.name="lecuyer::RngStream",
       .RNG.seed=as.numeric(randomNumbers(n=1, min=1, max=1e+06, col=1)))
}
params <- c('mu', 'sigma')
cl <- makePSOCKcluster(3)
clusterSetRNGStream(cl)
registerDoParallel(cl)

m <- foreach(i=1:3, .packages=c('rjags', 'random'),
             .combine='c', .final=mcmc.list) %dopar% {
  load.module( "lecuyer" )
  model.jags <- jags.model(textConnection(model), win.data, inits=inits,
                           n.chains=1, n.adapt=1000, quiet=TRUE)
  coda.samples(model.jags, params, 1000, thin=5)
}

I also changed the result handling so that the value returned by the foreach loop is an "mcmc.list" object, which is what the "coda.samples" function returns.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • I changed my TMP, TEMP, and TMPDIR environmental variables to C:\Temp. All 4 scenarios above result in permission denied type errors. I also noticed that in the doParallel and parallel examples above, temporary Rscript files were created in C:\Temp (e.g. Rscript1a44138d41). – Jesse Sep 02 '13 at 21:53
  • @Jesse At this point, I think you should show the problem to your sysadmins. They may know what is causing this problem, and they may be able to suggest a temp directory that doesn't have this permission problem. Or you could try another website that specializes in system administration, since this is no longer a programming issue. – Steve Weston Sep 03 '13 at 13:09