7

I'm trying to set up an asynchronous system() call in R. In order for this to be useful, the user needs a way to check whether the process has ended. The question is how to test for that? In how to run an executable file and then later kill or terminate the same process with R in Windows the suggestion seems to be to capture all pid's before and after the system() call in order to get the pid of the just-launched process (which can then be used to test whether it has ended), but this seems like an error prone way of doing it, on top of it being OS dependant...

Is there other approaches to this problem (doesn't have to involve pid's)

edit: The current scenario this should be used in:

I'm developing a Shiny GUI that handles potentially very long running calculations that are implemented in Java. The calculations are done in batches, and during these runs, even though R is idle, the GUI is locked from interacting with the R server as it is waiting for the Java process to finish. I want a way to initiate the Java process and not wait for it to finish (using the wait=FALSE parameter), but have a fail-safe way of checking that it has completed, so the GUI can be updated accordingly...

Community
  • 1
  • 1
ThomasP85
  • 1,624
  • 2
  • 15
  • 26
  • asynchronous with R : good luck! I think it is just not designed for. – agstudy Jun 16 '14 at 11:36
  • You have to push the boundaries you know : ). Furthermore it is not code within the R process that needs to run asynchronously, so the single-threaded nature of R doesn't really make a difference... – ThomasP85 Jun 16 '14 at 11:39
  • 3
    So this is very hacky and I guess there is plenty of better way of doing it (I hope), so i m not giving this as an answer but why not add a command in your system command to make it create or print something to a file when the process is done and have your script check for that file? something like `system("top;echo 'done' > 'done.txt'")`and in your script `while(!file.exists('done.txt'))` for instance. – plannapus Jun 16 '14 at 11:48
  • 2
    I have an example of controlling a process https://github.com/ropensci/RSelenium/blob/master/R/util.R#L183 . The process is started with a `system` call the resulting `PID` is recorded and `tools::pskill` is used to terminate the process later. Getting the `PID` is system dependent (easy with linux not as easy with other OS) – jdharrison Jun 16 '14 at 12:00
  • 1
    I agree w/ @plannapus . It's similar in concept to "lock files" which any number of (Windows, OSX, Linux) apps use to control read/write access during operation, or to verify subprocess completion. – Carl Witthoft Jun 16 '14 at 12:05
  • Also have a look at the `runr` package from @Yihui https://github.com/yihui/runr – jdharrison Jun 16 '14 at 12:06
  • I thought of a similar setup as @plannapus but as is said, it too is very hacky - I hoped a better (R supported) solution was available – ThomasP85 Jun 16 '14 at 12:17
  • @ThomasP85 well, can you give us more details about the context of using R as asynchronous system? I mean can you add a scenario or a use case of your workflow? – agstudy Jun 16 '14 at 12:42
  • I would second the vote for something analogous to lockfiles. I don't see any other way to do it. – Ben Bolker Jun 16 '14 at 13:56
  • I have posted a follow up question building on @plannapus 's idea - If any of you have some insight I would be grateful: http://stackoverflow.com/questions/24257271/r-using-wait-false-in-system-with-multiline-commands – ThomasP85 Jun 17 '14 at 06:50

3 Answers3

0

If we are willing to make calls to system to launch a process, then this question really seems to simplify to how can one get the PID to a just-launched process in Windows. This isn't really so much an R specific problem, but a Windows general problem. Looking through some answers here on SO, I see one answer that seems reasonable. Process the string result, and get your PID.

Alternatively, you can launch another Rscript asyncronously with system(...,wait=FALSE) and have it report its own PID back to the host process (via file or socket). That Rscript could in turn make your system call and then self-terminate when the system call is done. Then you just watch for the PID of the wrapping Rscript. This is slightly better than a lock file because you don't have to rely on the process you've called to complete successfully to clean up its own lock file.

I'd take some exception to agstudy's solution (even if it were on point). As ThomasP85 says, you can get asynchronous operation via system(..., wait=FALSE). In addition you can get asynchronous operation via sockets, e.g. svSocket and parallel (via a single node PSOCK cluster). There is nothing in either of those socket based approaches that will be fundamentally different from what the above deferred calls launched in python via rpy2 are accomplishing.

Community
  • 1
  • 1
russellpierce
  • 4,583
  • 2
  • 32
  • 44
0

Try the processx library. Using it, one runs a new process in the background and can check its status with a built-in function:

library(processx)
p <- process$new(command = "sleep", args="10")
p$is_alive()
   [1] TRUE
# After 10 seconds
p$is_alive()
   [1] FALSE
Sam
  • 127
  • 10
-3

R does not even support multi threaded system. So calling it asynchronously will not be very easy. One way to do this is to use call it over a language that supports asynchrnous call. For example python:

  • Use twisted for asynchronous operations.
  • Use rpy2 to call R calls in a python functions

Here a complete example, that calls 2 R functions asynchronously:

  1. given a radius it computes perimeter after sleeping a random time
  2. given the perimeter already computed it gives back the original radius

It is a dummy example, but it is enough flexible to be extended to any R scripts.

from twisted.internet.defer import Deferred
from twisted.internet import reactor
import rpy2.robjects as robjects

 ## define the functions 
 def get_perimeter(r):
     robjects.r('''
        f <- function(r) {
            Sys.sleep(sample(1:3,1))
            2 * pi * r

        }
        ''')
    r_f = robjects.r['f']
    print  r_f(r)
    return r_f(r)

def get_radius(p):
    robjects.r('''
        f <- function(p)    p/2 / pi
        ''')
    r_f = robjects.r['f']
    print  r_f(p)
    return r_f(p)


def job_done(_):
    from twisted.internet import reactor
    reactor.stop()

## the asynchronous call 
d = Deferred()
d.addCallback(get_perimeter)
d.addCallback(get_radius)
d.addCallback(job_done)


reactor.callWhenRunning(d.callback, 25410)

reactor.run()
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 4
    It is not R functions that needs to run async. It is a system command (which is already supported with the 'wait' parameter). What I'm looking for is a way to check whether the process started with the system() call has finished... Sorry if it weren't transparent in the initial question... – ThomasP85 Jun 16 '14 at 13:26