12

I am using RStan to sample from a large number of Gaussian Processes (GPs), i.e., using the function stan(). For every GP that I fit, another DLL gets loaded, as can be seen by running the R command

getLoadedDLLs()

The problem I'm running into is that, because I need to fit so many unique GPs, I'm exceeding the maximum number of DLLs that can be loaded, at which point I receive the following error:

Error in dyn.load(libLFile) : 
unable to load shared object '/var/folders/8x/n7pqd49j4ybfhrm999z3cwp81814xh/T//RtmpmXCRCy/file80d1219ef10d.so':
maximal number of DLLs reached...

As far as I can tell, this is set in Rdynload.c of the base R code, as follows:

#define MAX_NUM_DLLS 100

So, my question is, what can be done to fix this? Building R from source with a larger MAX_NUM_DLLS isn't an option, as my code will be run by collaborators who wouldn't be comfortable with that process. I've tried the naive approach of just unloading DLLs using dyn.unload() in the hopes that they'd just be reloaded when they're needed again. The unloading works fine, but when I try to use the fit again, R fairly unsurprisingly crashes with an error like:

*** caught segfault ***
address 0x121366da8, cause 'memory not mapped'

I've also tried detaching RStan in the hopes that the DLLs would be automatically unloaded, but they persist even after unloading the package (as expected, given the following in the help for detach: "detaching will not in general unload any dynamically loaded compiled code (DLLs)").

From this question, Can Rcpp package DLLs be unloaded without restarting R?, it seems that library.dynam.unload() might have some role in the solution, but I haven't had any success using it to unload the DLLs, and I suspect that after unloading the DLL I'd run into the same segfault as before.

EDIT: adding a minimal, fully-functional example:

The R code:

require(rstan)

x <- c(1,2)
N <- length(x)

fits <- list()
for(i in 1:100)
{
    fits[i] <- stan(file="gp-sim.stan", data=list(x=x,N=N), iter=1, chains=1)
}

This code requires that the following model definition be in the working directory in a file gp-sim.stan (this model is one of the examples included with Stan):

// Sample from Gaussian process
// Fixed covar function: eta_sq=1, rho_sq=1, sigma_sq=0.1

data {
  int<lower=1> N;
  real x[N];
}
transformed data {
   vector[N] mu;
   cov_matrix[N] Sigma;
   for (i in 1:N) 
     mu[i] <- 0;
   for (i in 1:N) 
     for (j in 1:N)
       Sigma[i,j] <- exp(-pow(x[i] - x[j],2)) + if_else(i==j, 0.1, 0.0);
 }
 parameters {
   vector[N] y;
 }
 model {
   y ~ multi_normal(mu,Sigma);
 }

Note: this code takes quite some time to run, as it is creating ~100 Stan models.

Community
  • 1
  • 1
Doug Jackson
  • 131
  • 2
  • 7
  • 1
    I am surprised that another DLL gets loaded for every process. I wonder if it would be easiest to prevent this from happening in the first place. Can you supply a minimal, but fully functional, example of code that captures your problem? – nograpes Jul 18 '14 at 19:28
  • 2
    That's a (R)Stan design issue and limitation. Rcpp just helps to create the dynamically loadable library; it has no view on whether it is advisable to load 100s of them. Eventually you will hit an OS limit (beyond the hardcoded R limit you identified) I suspect. – Dirk Eddelbuettel Jul 18 '14 at 21:46

2 Answers2

9

I can't speak for the issues regarding dlls, but you shouldn't need to compile the model each time. You can compile the model once and reuse it, which won't cause this problem and it will speed up your code.

The function stan is a wrapper for stan_model which compiles the model and the sampling method which draws samples from the model. You should run stan_model once to compile the model and save it to an object, and then use the sampling method on that object to draw samples.

require(rstan)

x <- c(1,2)
N <- length(x)

fits <- list()
mod <- stan_model("gp-sim.stan")
for(i in 1:100)
{
    fits[i] <- sampling(mod, data=list(x=x,N=N), iter=1, chains=1)
}

This is similar to the problem of running parallel chains, discussed in the Rstan wiki. Your code could by sped up by replace the for loop with something that processes the sampling in parallel.

jrnold
  • 126
  • 1
  • 6
  • 4
    For completeness, if you did have a valid reason to load 100 DLLs in an R session, I think you could use the `dyn.unload` function to unload some of them with `dyn.unload(file.path(tempdir(), paste0(get_stanmodel(stanfit)@dso@dso_filename, .Platform$dynlib.ext)))`, where `stanfit` is an object produced by the `sampling` or `stan`functions. Or you could replace `get_stanmodel(stanfit)` with the object produced by `stan_model`. However, you would be very limited as to what you could subsequently do with the `stanfit` object without crashing R (no `monitor`, `print`, `log_prob`, etc.) – Ben Goodrich Jul 19 '14 at 02:15
1

Here is, what I use to run several stan models in a row (Win10, R 3.3.0).

I needed to not only unload the dll-files but also delete them and other temporary files. Then, the filename for me was different than found in the stan object, as Ben suggested.

 dso_filenames <- dir(tempdir(), pattern=.Platform$dynlib.ext)
  filenames  <- dir(tempdir())
  for (i in seq(dso_filenames))
    dyn.unload(file.path(tempdir(), dso_filenames[i]))
  for (i in seq(filenames))
    if (file.exists(file.path(tempdir(), filenames[i])) & nchar(filenames[i]) < 42) # some files w/ long filenames that didn't like to be removeed
      file.remove(file.path(tempdir(), filenames[i]))