Replacement functions in R that don't take input

Question

This seems very related to several other questions that have been asked (this one for example), but I can't quite figure out how to do exactly what I want. Maybe replacement functions are the wrong tool for the job, which would also be a perfectly acceptable answer. I am much more familiar with Python than R and I can easily think of how I want to do it in Python but I can't quite get my head around how to approach it in R.

The problem: I am trying to modify an object in place within a function, without having to return it, but I don't need to pass in the value that modifies it, because this value is the result of a function call that's already contained in the object.

More specifically, I have a list (technically it's an s3 class, but I don't think that's actually relevant to this issue) that contains some things relating to a process started with processx::process$new() call. For reproducibility, here's a toy shell script you can run, and the code to get my res object:

echo '
echo $1
sleep 1s
echo "naw 1"
sleep 1s
echo "naw 2"
sleep 1s
echo "naw 3"
sleep 1s
echo "naw 4"
sleep 1s
echo "naw 5"
echo "All done."
' > naw.sh

Then my wrapper is something like this:

run_sh <- function(.args, ...) {
  p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
  return(list(process = p, orig_args = .args, output = NULL))
}

res <- run_sh(c("naw.sh", "hello"))

And res should look like

$process
PROCESS 'sh', running, pid 19882.

$output
NULL

$orig_args
[1] "naw.sh" "hello"

So, the specific issue here is a bit peculiar to process$new but I think the general principle is relevant. I am trying to collect all the output from this process after it is finished, but you can only call process$new$read_all_output_lines() (or it's sibling functions) once because the first time it will return the result from the buffer and the subsequent times it returns nothing. Also, I am going to call a bunch of these and then come back to "check on them" so I can't just call res$process$read_all_output_lines() right away because then it will wait for the process to finish before the function returns, which is not what I want.

So I'm trying to store the output of that call in res$output and then just keep that and return it on subsequent calls. Soooo... I need to have a function to modify res in place with res$output <- res$process$read_all_output_lines().

Here's what I tried, based on guidance like this, but it didn't work.

get_output <- function(.res) {
  # check if process is still alive (as of now, can only get output from finished process)
  if (.res$process$is_alive()) {
    warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
    invisible()
  } else {
    # if output has not been read from buffer, read it
    if (is.null(.res$output)) {
      output <- .res$process$read_all_output_lines()
      update_output(.res) <- output
    }
    # return output
    return(.res$output)
  }
}

`update_output<-` <- function(.res, ..., value) {
  .res$output <- value
  .res
}

Calling get_output(res) works the first time, but it does not store the output in res$output to be accessed later, so subsequent calls return nothing.

I also tried something like this:

`get_output2<-` <- function(.res, value) {
  # check if process is still alive (as of now, can only get output from finished process)
  if (.res$process$is_alive()) {
    warning(paste0("Process ", .res$process$get_pid(), " is still running. You cannot read the output until it is finished."))
    .res
  } else {
    # if output has not been read from buffer, read it
    if (is.null(.res$output)) {
      output <- .res$process$read_all_output_lines()
      update_output(.res) <- output
    }
    # return output
    print(value)
    .res
  }
}

Which just throws away the value but this feels silly because you have to call it with the assignment like get_output(res) <- "fake" which I hate.

Obviously I could also just return the modified res object, but I don't like that because then the user has to know to do res <- get_output(res) and if they forget to do that (the first time) then the output is lost to the ether and can never be recovered. Not good.

Any help is much appreciated!

"I am trying to modify an object in place within a function, without having to return it" R is a functional language. Learn the idioms and your experience with R will be vastly enhanced — Hong Ooi, Feb 04 '20 at 17:39
@HongOoi can you make any suggestions on how to use these idioms given my problem and constraints? — seth127, Feb 04 '20 at 19:01
It sounds like you should learn R6 classes for OO programming, see the rhub package for example: https://github.com/r-hub/rhub or `processx` itself. — Brian, Feb 04 '20 at 21:22

score 1 · Answer 1 · answered Feb 04 '20 at 17:26

I may be missing something here, but why don't you just write the output after you create the object so that it's there the first time the function returns?

run_sh <- function(.args, ...) 
{
  p <- processx::process$new("sh", .args, ..., stdout = "|", stderr = "2>&1")
  return(list(process = p, orig_args = .args, output = p$read_all_output_lines()))
}

So now if you do

res <- run_sh(c("naw.sh", "hello"))

You get

res
#> $`process`
#> PROCESS 'sh', finished.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#>  [1] "hello"                                    
#>  [2] "naw.sh: line 2: sleep: command not found" 
#>  [3] "naw 1"                                    
#>  [4] "naw.sh: line 4: sleep: command not found" 
#>  [5] "naw 2"                                    
#>  [6] "naw.sh: line 6: sleep: command not found" 
#>  [7] "naw 3"                                    
#>  [8] "naw.sh: line 8: sleep: command not found" 
#>  [9] "naw 4"                                    
#> [10] "naw.sh: line 10: sleep: command not found"
#> [11] "naw 5"                                    
#> [12] "All done."

oh, sorry. I forgot to mention that I don't want to wait for the process to finish. Like, I want to be able to trigger a bunch of these to run in the background and then be able to check back up on them later with something like `check_output(res1)`. If I call `read_all_output_lines()` immediately then it will wait to return until the process has finished. — seth127, Feb 04 '20 at 18:24
@seth127 I thought it best to just write a new answer based on the updated info. — Allan Cameron, Feb 04 '20 at 20:27

Allan Cameron · Accepted Answer · 2020-02-04T20:26:26.163

After further information from the OP, it looks as if what is needed is a way to write to the existing variable in the environment that calls the function. This can be done with non-standard evaluation:

check_result <- function(process_list) 
{ 
  # Capture the name of the passed object as a string
  list_name <- deparse(substitute(process_list))

  # Check the object exists in the calling environment
  if(!exists(list_name, envir = parent.frame()))
     stop("Object '", list_name, "' not found")

  # Create a local copy of the passed object in function scope
  copy_of_process_list <- get(list_name, envir = parent.frame())

  # If the process has completed, write its output to the copy
  # and assign the copy to the name of the object in the calling frame
  if(length(copy_of_process_list$process$get_exit_status()) > 0)
  {
    copy_of_process_list$output <- copy_of_process_list$process$read_all_output_lines()
    assign(list_name, copy_of_process_list, envir = parent.frame()) 
  }
  print(copy_of_process_list)
}

This will update res if the process has completed; otherwise it leaves it alone. In either case it prints out the current contents. If this is client-facing code you will want further type-checking logic on the object passed in.

So I can do

res <- run_sh(c("naw.sh", "hello"))

and check the contents of res I have:

res
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#> NULL

and if I immediately run:

check_result(res)
#> $`process`
#> PROCESS 'sh', running, pid 1112.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#> NULL

we can see that the process hasn't completed yet. However, if I wait a few seconds and call check_result again, I get:

check_result(res)
#> $`process`
#> PROCESS 'sh', finished.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#> [1] "hello"     "naw 1"     "naw 2"     "naw 3"     "naw 4"     "naw 5"    
#> [7] "All done."

and without explicitly writing to res, it has updated via the function:

res
#> $`process`
#> PROCESS 'sh', finished.
#> 
#> $orig_args
#> [1] "naw.sh" "hello" 
#> 
#> $output
#> [1] "hello"     "naw 1"     "naw 2"     "naw 3"     "naw 4"     "naw 5"    
#> [7] "All done."

Thank you very much. This all makes sense, but it's feeling more and more like I'm just trying to do something that R isn't designed to do. Specifically, be Object-Oriented. This seemed like a general question, but I think it's more of just a quirk of my `read_all_output_lines()` only being able to be called twice that's making me want to do this. Thank you just the same. This seems like a good solution. — seth127, Feb 04 '20 at 21:10
Thanks @seth127 . The processx package itself uses the R6 object-oriented system, which feels to me like the nearest R has to genuine object-orientation. It works well for small to medium projects. — Allan Cameron, Feb 04 '20 at 21:47

Replacement functions in R that don't take input

2 Answers2