1

I have strange environment/scoping dynamic that I've been trying to figure out, and looking for the right or recommended method for achieving this.

I've made a toy example of my problem below purely for illustration. (I'm aware this particular problem can be solved much more simply, but it illustrates the dynamic I'm trying to implement).

Current functioning code:

master_function <- 
  function(x, iter = 100){
    x_p1 <- function(){ x <<- x + 1 }
    x_m1 <- function(){ x <<- x - 1 }

    path <- numeric(iter)
    for(i in 1:iter){
      next_step <- sample(c('p', 'm'), 1)
      if(next_step == 'p'){
        x_p1()
      } else { 
        x_m1()
      }
      path[i] <- x
    }
    path
  }

The issue with this code (especially for an actually difficult problem) is that it makes debugging the x_p1, x_m1 function contents with the RStudio debug utility impossible.

Hoping to restructure the code to look something like:

master_function <- 
  function(x, iter = 100){
    master_env <- environment()
    path <- numeric(iter)
    for(i in 1:iter){
      next_step <- sample(c('p', 'm'), 1)
      if(next_step == 'p'){
        x_p1(master_env)
      } else { 
        x_m1(master_env)
      }
      path[i] <- x
    }
    path
  }

x_p1 <- function(env){ assign('x', get('x', envir = env) + 1, envir = env) }
x_m1 <- function(env){ assign('x', get('x', envir = env) - 1, envir = env) }

But this is also quite ugly. Is there a way to augment the search path, for example, such that access to the master_env is cleaner?

Edit: More information as requested by @MrFlick Essentially I have simulation with a lot of moving pieces. As it progresses, different events (the sub-functions being referenced) are triggered modifying the state of the simulation. These functions currently modify many different state objects for each function call. Since the functions are made within the master function call, I can take advantage of lexical scoping and the <<- operator, but I lose the ability to debug within those functions.

Trying to figure out how to create those functions outside of the master simulation. If I understand correctly, if I make the functions such that they consume the simulation state and return a modified version, it comes at a large memory cost.

jameselmore
  • 442
  • 9
  • 20
  • 1
    I can't understand what the requirements are from this example. It doesn't seem very illustrative. Maybe say in words exactly what you are hoping to accomplish? you seem to want to optimize for debugging but i'm not really sure what that means in this case. – MrFlick Mar 20 '19 at 20:26
  • @MrFlick edited the question with more detail – jameselmore Mar 20 '19 at 20:37
  • I don't see why you don't just track all your state information in a list and then pass in the current state to your functions and they can output an updated state list if needed. This would be much more R-like. Only values that change will be updated in memory. Do you have an example that shows the large memory cost? – MrFlick Mar 20 '19 at 20:56

2 Answers2

2

1) trace Use trace to insert debug statements after the definitions of x_p1 and x_m1 and then one can step through them when master_function is run.

trace(master_function, at = 4, quote({debug(x_p1); debug(x_m1) }))

untrace(master_function) turns this off. Use body(master_function)[4] to see which line corresponds to 4. See ?trace for more.

2) instrument Another possibility is to instrument your function like this and then call it with master(function(x, DEBUG = TRUE) to turn on debugging.

master_function <- 
  function(x, iter = 100, DEBUG = FALSE){
    x_p1 <- function(){ x <<- x + 1 }
    x_m1 <- function(){ x <<- x - 1 }
    if (DEBUG) {
      debug(x_p1)
      debug(x_m1)
    }

    path <- numeric(iter)
    for(i in 1:iter){
      next_step <- sample(c('p', 'm'), 1)
      if(next_step == 'p'){
        x_p1()
      } else { 
        x_m1()
      }
      path[i] <- x
    }
    path
  }
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Why does x need to reside in an alternative environment at all? The following internalizes and avoids the multiple environments entirely.

x_p1 <- function(z){ z + 1 }
x_m1 <- function(z){ z - 1 }
master_function <- 
  function(x, iter = 100){
    new_x <- x


    path <- numeric(iter)
    for(i in 1:iter){
      next_step <- sample(c('p', 'm'), 1)
      if(next_step == 'p'){
        new_x <- x_p1(new_x)
      } else { 
        new_x <- x_m1(new_x)
      }
      path[i] <- new_x
    }
    path
  }
Soren
  • 1,792
  • 1
  • 13
  • 16
  • This is an extremely simplified version of what I'm actually trying to achieve. Within `master_function` there are several dozen objects and 5-6 functions which aim to conditionally modify some or all of those objects in the `master_function` environment. If I stick with an approach similar to the above, it's difficult to debug any errors as I cannot place breakpoints within those functions – jameselmore Mar 20 '19 at 20:23
  • I've updated the proposal to locate the x_p1 and x_m1 fully outside the master_function, which perhaps pushes the original suggestion even further toward isolating each function to a self-contained environment scope. By avoiding "global" or cross-scoped environment variables all together, also simplifies the goal of debugging. Given the simplicity of the example, it's hard to assess, but in most circumstances functions can operate using only the input variables; and can even operate on and return multiple variables by returning a list – Soren Mar 20 '19 at 20:31
  • 1
    @jameselmore: Having various places in your code modify variables in "distant" environments is ill-advised, as it leads to confusing code logic. Place all your variables inside a list, then pass the entire list to your 5-6 functions to modify specific elements. Each function should return the modified list. This is known as [copy-on-modify semantics](https://stackoverflow.com/questions/15759117/what-exactly-is-copy-on-modify-semantics-in-r-and-where-is-the-canonical-source), which is naturally suited for functional languages like R. – Artem Sokolov Mar 20 '19 at 20:32
  • @Soren: Because of pass-by-value nature of R, `new_x <- x` in the answer is not needed. Any changes to `x` inside `master_function()` are not visible outside the function. – Artem Sokolov Mar 20 '19 at 20:33
  • @ArtemSokolov agreed on not needing new_x, but entered it as OP possibly needed to have 'x' and external global variable 'x' hold separate values. – Soren Mar 20 '19 at 20:36
  • @ArtemSokolov - makes sense, but some of my objects are very large. I imagine that approach comes with significant performance costs, no? – jameselmore Mar 20 '19 at 20:40
  • @jameselmore if you're that worried about memory, write your code in C++ and use Rcpp to interface with it. Counting bytes is not playing to R's strengths. – Hong Ooi Mar 20 '19 at 20:45
  • @HongOoi sure, but that would take an extensive amount of time. I have something that functions now, and serves it's purpose well in R as it's hyper flexible. The question at hand is whether or not within R there is a way to slightly structure my existing code to make it a little easier to work with – jameselmore Mar 20 '19 at 20:47
  • data.table uses modify by reference semantics, which can avoid having copies of data.frame or similar objects in memory. https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reference-semantics.html – Soren Mar 20 '19 at 20:47
  • 1
    In general, R and most popular packages present copy-on-modify semantics, but only actually copy when needed behind the scenes. As always, the best way to measure the impact of code changes on memory is [profiling](https://stackoverflow.com/questions/5184953/memory-profiling-in-r-tools-for-summarizing). – Artem Sokolov Mar 20 '19 at 20:55
  • 1
    @jameselmore rewrite your code and worry about other stuff if it actually becomes a problem – Hong Ooi Mar 20 '19 at 20:55