3

This question is similar to Source script to separate environment in R, not the global environment, but with a key twist.

Consider a script that sources another script:

# main.R
source("funs.R")
x <- 1
# funs.R
hello <- function() {message("Hi")}

I want to source the script main.R and keep everything in a "local" environment, say env <- new.env(). Normally, one could call source("main.R", local = env) and expect everything to be in the env environment. However, that's not the case here: x is part of env, but the function hello is not! It is in .GlobalEnv.

Question: How can I source a script to a separate environment in R, even if that script itself sources other scripts, and without modifying the other scripts being sourced?

Thanks for helping, and let me know if I can clarify anything.

EDIT 1: Updated question to be explicit that scripts being source cannot be modified (assume they are not under your control).

Guilherme Salomé
  • 1,899
  • 4
  • 19
  • 39
  • While you cannot modify the sourced scripts, you can read them .. so you can't you read them as strings, and modify the strings, and then execute them .. (for exemple using eval and parse .. ) ? It should work at leat for the first level of scripts .. but have you several levels of sourced scritps ? – MrSmithGoesToWashington Oct 02 '21 at 19:24
  • In general, fixing the sourced code (e.g. by making it a package) (taking control if necessary) is best, followed by isolation in case you don't want to. In some cases, there might be easier solutions - but not in general. This sounds like a xy problem but you would need to give more background to have it solved. – jan-glx Oct 07 '21 at 09:34

3 Answers3

2

You can use trace to inject code in functions, so you could force all source calls to set local = TRUE. Here I just override it if local is FALSE in case any nested calls to source actually set it to other environments due to special logic of their own.

env <- new.env()

# use !isTRUE if you want to support older R versions (<3.5.0)
tracer <- quote(
  if (isFALSE(local)) {
    local <- TRUE
  }
)

trace(source, tracer, print = FALSE, where = .GlobalEnv)

# if you're doing this inside a function, uncomment the next line
#on.exit(untrace(source, where = .GlobalEnv))

source("main.R", local = env)

As mentioned in the code, if you wrap this logic in a function, consider using on.exit to make sure you untrace even if there are errors.

EDIT: as mentioned in the comments, this could have issues if some of the scripts you will be loading assume there is 1 (global) environment where everything ends. I suppose you could change the tracer to something like

tracer <- quote(
  if (missing(local)) {
    local <- TRUE
  }
)

or maybe

tracer <- quote(
  if (isFALSE(local)) {
    # fetch the specific environment you created
    local <- get("env", .GlobalEnv)
  }
)

The former assumes that if the script didn't specify local at all, it doesn't care about which environment ends up holding everything. The latter assumes that source calls that didn't specify local or set it to FALSE want everything to end up in 1 environment, and modify the logic to use your environment instead of the global one.

Alexis
  • 4,950
  • 1
  • 18
  • 37
  • Thanks for you answer, very interesting approach. Would you be able to clarify how `local` inside `tracer` gets captured by the `source` function? – Guilherme Salomé Oct 06 '21 at 12:16
  • This approach will cause problems in case the sourced scripts rely on `global` execution. – jan-glx Oct 06 '21 at 13:22
  • 1
    @GuilhermeSalomé you can imagine that the `quote`d code gets injected verbatim at the beginning of the function you trace, so it's essentialy as if you had modified the source code itself, and you can manipulate anything inside the function call accordingly. – Alexis Oct 06 '21 at 16:08
  • 2
    @jan-glx yes, it's not really bullet-proof for any scenario, although I've added potential alternatives. – Alexis Oct 06 '21 at 16:16
  • @jan-glx That's a good call out. Do you have a minimal example in mind that you could share? – Guilherme Salomé Oct 06 '21 at 20:35
  • @Alexis that 3rd tracer looks better! However, the sourced scripts might still screw up / be screwed up by this, for example they might use `<<-` (assign in global env). – jan-glx Oct 07 '21 at 09:29
  • @jan-glx they could even use `get("...", .GlobalEnv)` because they assume that's the environment holding everything, so it certainly won't work for every possible scenario. Maybe even `.GlobalEnv` could be shadowed? But what about `globalenv()`? It wouldn't be easy to come up with a truly general-purpose solution. – Alexis Oct 07 '21 at 15:36
1

Disclaimer: Very ugly and potentially dangerous, but whatever.

Redefine source:

env<-new.env()
source<-function(...) base::source(..., local = env)
source("main.R")
#just remove your redefinition when you don't need it
rm(source)
nicola
  • 24,005
  • 3
  • 35
  • 56
  • 2
    I was going to suggest this, but with: `source <- function(...,local=NULL) base::source(...,local=env)` Just in case some of the sourced files already contain a `local=` argument – Michael Barrowman Oct 04 '21 at 21:16
  • Thanks for your answer. It is an interesting approach. I'm favoring Alexis' answer above because it doesn't require rewriting a base function. – Guilherme Salomé Oct 06 '21 at 12:19
1

The best way to protect yourself from side effects of code you cannot control is isolation. You can use callr to easily execute the scripts isolated in a separate R session:

using environments:

env <- new.env()
env <- as.environment(callr::r(function(env) {
    list2env(env, .GlobalEnv)
    source("main.R")
    as.list(.GlobalEnv)
}, args = list(as.list(env))))
env
#> <environment: 0x0000000018124878>
env$hello()
#> Hi

simpler version sticking to lists:

params <- list()
results <- callr::r(function(params) {
    list2env(params, .GlobalEnv)
    source("main.R")
    as.list(.GlobalEnv)
}, args = list(params))
results
#> $x
#> [1] 1
#> 
#> $hello
#> function () 
#> {
#>   message("Hi")
#> }
results$hello()
#> Hi

The param part is only needed if you actually need to provide input the scripts (not used for you example). Obviously, this will not work for open connections and similar stuff. In that case, you might want to look into callr::r_session.

jan-glx
  • 7,611
  • 2
  • 43
  • 63
  • Thanks for your answer. An interesting approach, I didn't know about this package before. Unfortunately, it looks like the package has a big downside: `Note that the arguments will be serialized and saved to a file, so if they are large R objects, it might take a long time for the child process to start up.` This is a deal breaker for me, but certainly useful in other situations. – Guilherme Salomé Oct 06 '21 at 20:38
  • Yes, shared memory is not so easy with R, but you could lessen the burden by having that temporary file in RAM only (by setting `TMPDIR=/run/user/you` or where ever you have a [ramdisk](https://unix.stackexchange.com/questions/188536/how-to-make-a-temporary-file-in-ram)) – jan-glx Oct 07 '21 at 09:12