72

I'm developing a package in R. I have a bunch of functions, some of them need some global variables. How do I manage global variables in packages?

I've read something about environment, but I do not understand how it will work, of if this even is the way to go about the things.

bartektartanus
  • 15,284
  • 6
  • 74
  • 102
bskard
  • 1,040
  • 1
  • 9
  • 13

6 Answers6

74

You can use package local variables through an environment. These variables will be available to multiple functions in the package, but not (easily) accessible to the user and will not interfere with the users workspace. A quick and simple example is:

pkg.env <- new.env()

pkg.env$cur.val <- 0
pkg.env$times.changed <- 0

inc <- function(by=1) {
    pkg.env$times.changed <- pkg.env$times.changed + 1
    pkg.env$cur.val <- pkg.env$cur.val + by
    pkg.env$cur.val
}

dec <- function(by=1) {
    pkg.env$times.changed <- pkg.env$times.changed + 1
    pkg.env$cur.val <- pkg.env$cur.val - by
    pkg.env$cur.val
}

cur <- function(){
    cat('the current value is', pkg.env$cur.val, 'and it has been changed', 
        pkg.env$times.changed, 'times\n')
}

inc()
inc()
inc(5)
dec()
dec(2)
inc()
cur()
danronmoon
  • 3,814
  • 5
  • 34
  • 56
Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • 27
    This is a useful practice, to which I would add that, as a safety measure when creating environments as variable containers, one should generally set the parent environment to `emptyenv()`, to protect against accidentally picking up values higher up in the search path: thus `new.env(parent = emptyenv())`, instead of just `new.env()`. – egnha Oct 04 '16 at 10:26
  • 5
    Another addendum - you may need to do `assign('key', value, pkg.env)` instead of `pkg.env$key <- value` in recent versions of R, because `pkg.env` will usually be a locked environment. – Ken Williams Apr 18 '19 at 15:15
  • 1
    It seems like every time the package is sourced a new environment (pkg.env) gets created. This might increase the memory footprint if "required(pkg)" is executed multiple times. Is there anyway to avoid it? – dabsingh Apr 28 '19 at 02:32
  • 1
    @dabsingh, I have not looked through the source code of `require` and everything that it does. But the help page for `require` says that it will not reload a namespace that is already loaded (middle of first paragraph under details). So, I don't think that a new environment will be created each time. – Greg Snow Apr 29 '19 at 15:44
  • thanks @GregSnow, haven't realized so far that one can have an open script in the package :) this was an illumination for me! – Edgar Manukyan Jun 02 '19 at 01:25
  • @Edager Manukyan: I do not get what you mean by `open script`. Could you explain? – pietrodito Apr 23 '20 at 10:47
  • @Greg snow: Could you confirm that, this way of dealing with global variables respects the CRAN Repository Policy 'Packages should not modify the global environment (user’s workspace).' ? – pietrodito May 05 '20 at 10:37
  • @pietrodito, yes this only modifies things in the environment specific to the package which is completely separate from the global environment/user workspace. I have a package on CRAN that uses this with no pushback from the CRAN maintainers about it. However, this post is nearly 8 years old, now I would suggest considering the R6 package instead, someday when I have enough time I plan to rewrite the functions that use the local environment to use R6 objects instead. – Greg Snow May 05 '20 at 14:37
  • @Greg snow: I am reading Advanced R and just finished the Enviromnent chapter. Now I fully understand the answer. R6 objects may be the solution I was looking for, thanks for the tip. – pietrodito May 06 '20 at 15:21
  • How would one access this newly created environment from outside the package? – mkirzon Feb 24 '21 at 18:23
  • @mkirzon, generally you do not want general access to package local variables from outside the package. If you really do, then write functions in the package that get/set values in the local environment. If you really want to break this, you can use the `environment` function to access the environment of a function in the package and work from there, but that can be dangerous. If you want someone outside the package to access the environment, then you are probably moving into areas where creating R6 objects is a much better (both simpler and more powerful) approach. – Greg Snow Feb 24 '21 at 18:42
  • @GregSnow I totally understand we usually shouldn't be doing this. However, I needed this for some advanced troubleshooting of a package outside of my development environment (basically was seeing different behavior as a user compared to development stage). I found the way to do this. Obviously, do so at your own risk: `packagename:::newEnvName$someVar` – mkirzon Feb 24 '21 at 18:45
21

You could set an option, eg

options("mypkg-myval"=3)
1+getOption("mypkg-myval")
[1] 4
James
  • 65,548
  • 14
  • 155
  • 193
  • 2
    Where exactly will this be stored? – bskard Sep 26 '12 at 11:04
  • @Rimbaud In a pairlist called `.Options` located in the `base` package. – James Sep 26 '12 at 11:17
  • This is stored in a global options list for the R session in which the package is loaded. See `?options`. – Paul Hiemstra Sep 26 '12 at 11:18
  • I would say this is not typically a good practice since options in R can be edited anywhere by anybody -- it's hard to know for sure that one of your downstream dependencies didn't edit the value between one execution and the next; or even harder for several of your downstream dependencies to interoperate without interfering with the other's modification of the option. There are certainly use cases for this approach but heavy caution is warranted IMO – MichaelChirico Sep 07 '21 at 18:39
17

In general global variables are evil. The underlying principle why they are evil is that you want to minimize the interconnections in your package. These interconnections often cause functions to have side-effects, i.e. it depends not only on the input arguments what the outcome is, but also on the value of some global variable. Especially when the number of functions grows, this can be hard to get right and hell to debug.

For global variables in R see this SO post.

Edit in response to your comment: An alternative could be to just pass around the needed information to the functions that need it. You could create a new object which contains this info:

token_information = list(token1 = "087091287129387",
                         token2 = "UA2329723")

and require all functions that need this information to have it as an argument:

do_stuff = function(arg1, arg2, token)
do_stuff(arg1, arg2, token = token_information)

In this way it is clear from the code that token information is needed in the function, and you can debug the function on its own. Furthermore, the function has no side effects, as its behavior is fully determined by its input arguments. A typical user script would look something like:

token_info = create_token(token1, token2)
do_stuff(arg1, arg2, token_info)

I hope this makes things more clear.

Community
  • 1
  • 1
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • 4
    Thanks for the answer. I have experience with programming, and know that global variables generally are a nogo. However, I'm establishing an API access to a service, in order to stay connected to this service, the functions need a couple of tokens. These tokens should be accesible by all the functions, what I've come up with, is creating a .RData file that stores this data, but that seems like a bad idear. – bskard Sep 26 '12 at 09:40
  • 7
    The normal R pattern is to have some kind of 'handle' object that keeps your tokens, and pass that handle to your functions. That also lets you have multiple concurrent sessions with different tokens. That's the pattern for database access, for example. – Spacedman Sep 26 '12 at 11:10
  • 14
    I think your argument for why global variables are evil needs some tweaking for R - all of the functions you create in the package are global variables. Are they evil? ;) – hadley Oct 08 '12 at 14:45
  • 1
    All globals are evil, but some are more evil than others ;). Reference classes seem to be a more classical object oriented approach. This would allow object methods (functions) to be local as well. – Paul Hiemstra Oct 08 '12 at 15:03
3

The question is unclear:

  • Just one R process or several?

  • Just on one host, or across several machine?

  • Is there common file access among them or not?

In increasing order of complexity, I'd use a file, a SQLite backend via the RSQlite package or (my favourite :) the rredis package to set to / read from a Redis instance.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
2

You could also create a list of tokens and add it to R/sysdata.rda with usethis::use_data(..., internal = TRUE). The data in this file is internal, but accessible by all functions. The only problem would arise if you only want some functions to access the tokens, which would be better served by:

  1. the environment solution already proposed above; or
  2. creating a hidden helper function that holds the tokens and returns them. Then just call this hidden function inside the functions that use the tokens, and (assuming it is a list) you can inject them to their environment with list2env(..., envir = environment()).
1

If you don't mind adding a dependency to your package, you can use an R6 object from the homonym package, as suggested in the comments to @greg-snow's answer.

R6 objects are actual environments with the possibility of adding public and private methods, are very lightweight and could be a good and more rigorous option to share package's global variables, without polluting the global environment.

Compared to @greg-snow's solution, it allows for a stricter control of your variables (you can add methods that check for types for example). The drawback can be the dependency and, of course, learning the R6 syntax.

library(R6)
MyPkgOptions = R6::R6Class(
  "mypkg_options",
  public = list(
    get_option = function(x) private$.options[[x]]
  ),
  active = list(
    var1 = function(x){
      if(missing(x)) private$.options[['var1']]
      else stop("This is an environment parameter that cannot be changed")
    }
    ,var2 = function(x){
      if(missing(x)) private$.options[['var2']]
      else stop("This is an environment parameter that cannot be changed")
    }
  ),
  private = list(
    .options = list(
      var1 = 1,
      var2 = 2
    )
  )
)
# Create an instance
mypkg_options = MyPkgOptions$new()
# Fetch values from active fields
mypkg_options$var1
#> [1] 1
mypkg_options$var2
#> [1] 2
# Alternative way
mypkg_options$get_option("var1")
#> [1] 1
mypkg_options$get_option("var3")
#> NULL
# Variables are locked unless you add a method to change them
mypkg_options$var1 = 3
#> Error in (function (x) : This is an environment parameter that cannot be changed

Created on 2020-05-27 by the reprex package (v0.3.0)

Duccio A
  • 1,402
  • 13
  • 27