6

I am using R 3.0.1 both on Windows 7 and Linux (SUSE Server 11 (x86_64)). The following example code produces an error on Windows but not on Linux. All the toolboxes listed are up-to-date in both machines. The Windows error is:

Error in { : task 1 failed - "NULL value passed as symbol address"

If I change %dopar% to %do%, the Windows code runs without any errors. My initial guess was that this relates to some configuration issue in Windows and I tried reinstalling Rcpp and R but that did not help. The error seems to be related to scoping - if I define and compile the function cFunc inside f1, then %dopar% works but, as expected, it is very slow since we are calling the compiler once for each task.

Does anyone have some insights on why the error happens or suggestions on how to fix it?

library(inline)
sigFunc <- signature(x="numeric", size_x="numeric")
code <- ' double tot =0;
for(int k = 0; k < INTEGER(size_x)[0]; k++){
tot += REAL(x)[k];
};
return ScalarReal(tot);
' 
cFunc <- cxxfunction(sigFunc, code)

f1 <- function(){
x <- rnorm(100)
a <- cFunc(x=x, size_x=as.integer(length(x)))
return(a)
}

library(foreach)
library(doParallel)
registerDoParallel()
# this produces an error in Windows but not in Linux
res <- foreach(counter=(1:100)) %dopar% {f1()}
# this works for both Windows and Linux
res <- foreach(counter=(1:100)) %do% {f1()}

# The following is not a practical solution, but I can compile cFunc inside f1 and then     this works in Windows but it is very slow
f1 <- function(){
library(inline)
sigFunc <- signature(x="numeric", size_x="numeric")

code <- ' double tot =0;
for(int k = 0; k < INTEGER(size_x)[0]; k++){
tot += REAL(x)[k];
};
return ScalarReal(tot);
' 
cFunc <- cxxfunction(sigFunc, code)
x <- rnorm(100)
a <- cFunc(x=x, size_x=as.integer(length(x)))
return(a)
}
# this now works in Windows but is very slow
res <- foreach(counter=(1:100)) %dopar% {f1()}

Thanks! Gustavo

Romain Francois
  • 17,432
  • 3
  • 51
  • 77
user2684507
  • 63
  • 1
  • 5

2 Answers2

6

The error message "NULL value passed as symbol address" is unusual, and isn't due to the function not being exported to the workers. The cFunc function just doesn't work after being serialized, sent to a worker, and unserialized. It also doesn't work when it's loaded from a saved workspace, which results in the same error message. That doesn't surprise me much, and it may be a documented behavior of the inline package.

As you've demonstrated, you can work-around the problem by creating cFunc on the workers. To do that efficiently, you need to do it only once on each of the workers. To do that with the doParallel backend, I would define a worker initialization function, and execute it on each of the workers using the clusterCall function:

worker.init <- function() {
  library(inline)
  sigFunc <- signature(x="numeric", size_x="numeric")
  code <- ' double tot =0;
  for(int k = 0; k < INTEGER(size_x)[0]; k++){
  tot += REAL(x)[k];
  };
  return ScalarReal(tot);
  '
  assign('cFunc', cxxfunction(sigFunc, code), .GlobalEnv)
  NULL
}

f1 <- function(){
  x <- rnorm(100)
  a <- cFunc(x=x, size_x=as.integer(length(x)))
  return(a)
}

library(foreach)
library(doParallel)
cl <- makePSOCKcluster(3)
clusterCall(cl, worker.init)
registerDoParallel(cl)
res <- foreach(counter=1:100) %dopar% f1()

Note that you must create the PSOCK cluster object explicitly in order to call clusterCall.

The reason that your example worked on Linux is that the mclapply function is used when you call registerDoParallel without an argument, while on Windows a cluster object is created and the clusterApplyLB function is used. Functions and variables aren't serialized and sent to the workers when using mclapply, so there is no error.

It would be nice if doParallel included support for initializing the workers without the need for using clusterCall, but it doesn't yet.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • +1 What a nice way to export a function. Could you alternatively use the `.export` argument of `foreach`? – Simon O'Hanlon Aug 15 '13 at 06:04
  • @SimonO101 If you use `.verbose=TRUE`, you will see that `cFunc` is auto-exported, explaining why you don't get the usual "object not found" error message. I think there is state on the master that isn't included in the serialized function `cFunc` which requires more work. – Steve Weston Aug 15 '13 at 12:55
4

The easiest 'workaround', I could think, would be

1) Write your code in a separate source file, say cFunc.c,

2) Compile it with R CMD SHLIB,

3) dyn.load that function within your foreach call.

For example,

cFunc.c
=======

#include <R.h>
#include <Rinternals.h>

SEXP cFunc( SEXP x, SEXP size_x ) {

  double tot = 0;
  for (int k=0; k < INTEGER(size_x)[0]; ++k ) {
    tot += REAL(x)[k];
  }
  return ScalarReal(tot);

}

and

library(foreach)
library(doParallel)
registerDoParallel()
x <- as.numeric(1:100)
size_x <- as.integer(length(x))
res <- foreach(counter=(1:100)) %dopar% { 
  dyn.load("cFunc.dll")
  .Call("cFunc", x, size_x) 
}

Alternatively (and probably better), consider building an actual package with this function exported that you can use.

Kevin Ushey
  • 20,530
  • 5
  • 56
  • 88
  • 3
    Repeat after me: Write. A. Package. It really is the sanest way to organize code. – Dirk Eddelbuettel Aug 15 '13 at 15:05
  • I agree with Dirk. Writing a package solves many problems associated with parallel computing in R, along with many other benefits. – Steve Weston Aug 15 '13 at 15:17
  • 3
    Writing a package sounds good, but it is really a pain to run the code on several machines. Every time the cpp code changes, you have to update the package on all the machines. Not good. – Feng Jiang May 31 '18 at 01:55
  • @Jfly : But if you write a script that updates your package on all machines, there is no more hassle. – quickreaction Jul 05 '19 at 18:40