1

I want to optimize a function from a package in R using optimParallel. Till now I only optimized functions that I wrote in my environment and it worked. But functions from any package don't work and I get a Error. I checked with .libPaths() if the paths are the same on each node and I used Sys.info() to check for any differences. Here is an example (which is not meaningful, but it should show my problem)

library(optimParallel)

.libPaths()
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"       

cl <- makeCluster(2) #also tried to set "master" to my IP
clusterEvalQ(cl, .libPaths())
[[1]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library"       

[[2]]
[1] "C:/Users/Name/Documents/R/win-library/3.5" "C:/Program Files/R/R-3.5.1/library" 

setDefaultCluster(cl)
optimParallel(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
Error in checkForRemoteErrors(val) : 
   2 nodes produced errors; first error: object 'C_dnorm' not found

#for comparison 
optim(par=0, dnorm, mean=1, method = "L-BFGS-B")$par
[1] -5.263924

What am I doing wrong?

To Mate
  • 51
  • 6
  • From `?optimParallel`: "No documentation for ‘optimParallel’ in specified packages and libraries: you could try ‘??optimParallel’". (It appears the parallel process cannot find the `stats` package. So perhaps you should look at the documenation for the unnamed package to see how you should pass the names of required packages.) – IRTFM Sep 29 '18 at 08:25
  • So I edited your title to reflect the fact that it is the `optimParallel` package, not the Parallel package. Also made it more specific and informative. – IRTFM Sep 29 '18 at 09:00
  • `optimParallel()` uses `parallel::parLapply()`. Does `parLapply(cl=cl, X=list(1,2), dnorm)` lead to the same error? – Nairolf Oct 09 '18 at 22:08
  • 1
    No, parLapply does work. I tried `parLapply(cl, X=list(1,2), optim, dnorm)` and this gives the same results as the normal call of optim. It seems that optimParallel can't find C/C++ Code behind the functions (on my laptop). – To Mate Oct 10 '18 at 15:05

2 Answers2

0

Reasoning that your error message indicated that the parallel processes were not getting adequate information, I looked at the examples in the documentation of the optimParallel package. The first one defines a helper function which will carry an environment with it, but it otherwise resembles yours in some respects.

library(optimParallel)
 set.seed(123); x <- rnorm(n=1000, mean=1, sd=2)
 negll <- function(par, x) -sum(dnorm(x=x, mean=par[1], sd=par[2], log=TRUE))
 o1 <- optimParallel(par=c(0, 1), fn=negll, x=x, method="L-BFGS-B", lower=c(-Inf, 0.0001))
 o1$par
#[1] 1.032256 1.982398

That example also differs from yours in that it is using data to estimate the parameters. I'm not sure what your result means, whereas I do understand what the values returned by the modification of that example that I posted here. The minimum log-likelihood for that particular data (not completely reproducible since I forgot to set a seed) is at a mean of 1.126 and an sd of 2.007.

For an example of how to create a situation where the environment of a non-base package gets carried to the workers, see this prior answer: parallel::clusterExport how to pass nested functions from global environment?

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Sorry forgot to include the package, but I edited it now. The results from my example doesn't make any sense, just wanted to demonstrate my problem. The difference between my example and the example from the documentation is, that the negll function is defined in the global environment. My function is in (my own) package. Because I got an error message with my own package, I tried to optimize a function from a base package (dnorm). But the error keeps practically the same. – To Mate Sep 29 '18 at 09:41
  • Did you commit the same "errors" in you package as you did in your example, i.e not using data and not defining a function that would carry an environment to all the workers? – IRTFM Sep 29 '18 at 17:27
0

Edit: The problem is solved in optimParallel version 0.7-4

The version is available on CRAN: https://CRAN.R-project.org/package=optimParallel


For older versions:

A workaround is to wrap dnorm() into a function defined in the .GlobalEnv.

library("optimParallel")
cl <- makeCluster(2) 
setDefaultCluster(cl)
f <- function(x, mean) dnorm(x, mean=mean)
optimParallel(par=0, f, mean=1, method="L-BFGS-B")$par
[1] -5.263924

A more difficult task is to explain why the problem occurs:

  • optimParallel() uses parallel::parLapply() to evaluate f.
  • parLapply() has the arguments cl, X, fun.
  • If we would use parLapply() without pre-processing the arguments passed via ... of optimParallel(), f could not have arguments named cl, X, fun, because this would cause errors like:

    Error in lapply(X = x, FUN = f, ...) (from #2) : 
    formal argument "X" matched by multiple actual arguments
    
  • Simply speaking, optimParallel() avoids this error by removing all arguments from f, putting them into an environment and evaluating f in that environment.
  • One problem of that approach occurs when f is defined in another R package and links to compiled code. That case is illustrated in the question above.

Suggestions for better approaches to handle the issue are welcome. I opened a corresponding question here. As long as there is no better solution, one can use the workaround illustrated above.

Nairolf
  • 2,418
  • 20
  • 34