16

After I have played around for some time using R's parallel package on my Debian-based machine I still can't find a way to remove all zombie child-processes after a computation.

I'm searching for a general and OS independent solution.

Below a simple script illustrating the problem for 2 cores:

library(parallel)
testfun <- function(){TRUE}

cltype <- ifelse(.Platform$OS.type != "windows", "FORK", "PSOCK")
cl <- makeCluster(2, type = cltype)
p <- clusterCall(cl, testfun)
stopCluster(cl)

Unfortunately, this script leaves two zombie processes in the process table which only get killed if R is shut down.

user625626
  • 1,102
  • 2
  • 10
  • 16

2 Answers2

7

This only seems to be an issue with "FORK" clusters. If you make a "PSOCK" cluster instead, the processes will die when you call stopCluster(cl).

Is there anything preventing you from using a "PSOCK" cluster on your Debian-based machine?

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Hi Josh,Sorry for my late reply - you are right, this only seems to be a problem for Fork Clusters. PSOCK clusters also work on my Debian Machine - just thought Forking would be faster. Thanks a lot! – user625626 Mar 05 '12 at 19:02
  • 2
    This seems to be a silly oversight with FORK clusters. I've posted a bug report at https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15471 . Zombie processes are mostly harmless because they consume no resources. They are just sitting in the process table so that the parent process can examine their exit status. Examining their exit status with `library(fork) wait()` will clean up the zombies one at a time (and print the exit status of each). – computermacgyver Sep 24 '13 at 07:33
  • 1
    The fork package is no longer available. – russellpierce May 27 '15 at 12:09
4

Probably the answer of your problem is in the help file of makeCluster() command.

At the bottom of the file, it is written : It is good practice to shut down the workers by calling stopCluster: however the workers will terminate themselves once the socket on which they are listening for commands becomes unavailable, which it should if the master R session is completed (or its process dies).

The solution is (it is working for me) to define a port for your cluster while you are creating it.

cl <- makeCluster(2, type = cltype,port=yourPortNumber)

another (may be not usefull) solution is setting a timeout for your sockets. timeout variable is in seconds.

cl <- makeCluster(2, type = cltype,port=yourPortNumber,timeout=50)

In any case, the aim should be to make the socket connection unavailable.either closing the ports or closing the main R process would do this.

Edit: What I meant was to close the ports which the process is listening. It should be OS independent. you can try to use -> showConnections(all = TRUE); . This will give all the connections. Then you can try closeAllConnections();

Sorry if this doesn't work also.

zipizip
  • 289
  • 2
  • 5