35

In the "What is the most useful R trick?" (here), I read that using environments give "pass-by-reference capabilities". Are there any limits and/or gotchas with this approach?

Also, in general what are the pros and cons of using created environments? This is something I've been confused about for quite some time, so any clarity or reference would be very helpful to me.

Thank you in advance.

Community
  • 1
  • 1
ramhiser
  • 3,342
  • 3
  • 23
  • 29
  • This may be gnarly and is surely beyond the scope of what's needed for a "standard" R user (`install.packages` + import data + run statistical tests on `data.frame`s). But for gnarly tasks, I sometimes think of `new.env` like creating a pointer. If I wanted to emulate a C `struct` then I would do that with a sequence of nested `new.env`'s. – isomorphismes Apr 16 '15 at 21:00
  • You can also do [multi-assign](https://stat.ethz.ch/R-manual/R-devel/library/base/html/list2env.html) with environments. (A supposedly missing feature which eg Pythonistas sometimes complain about.) – isomorphismes Apr 16 '15 at 21:02

2 Answers2

38

While I agree with Harlan's overall advice (i.e. don't use something unless you understand it), I would add:

Environments are a fundamental concept in R, and in my view, extremely useful (in other words: they're worth understanding!). Environments are very important to understand issues related to scope. Some basic things that you should understand in this context:

  1. search(): will show you the workspace; environments are listed in order of priority. The main environment is .GlobalEnv, and can always be referenced as such.
  2. ls(): will show you what's contained in an environment
  3. attach/detach: creates a new environment for an object
  4. get, assign, <<-, and <-: you should know the difference between these functions
  5. with: one method for working with an environment without attaching it.

Another pointer: have a look at the proto package (used in ggplot), which uses environments to provide controlled inheritance.

Lastly, I would point out that environments are very similar to lists: they can both store any kind of object within them (see this question). But depending on your use case (e.g. do you want to deal with inheritance and priority), a list can be easier to work with. And you can always attach a list as an environment.

Edit: If you want to see an example of proto at work in ggplot, have a look that the structure of a ggplot object, which is essentially a list composed partially of environments:

> p <- qplot(1:10, 1:10)
> str(p)
List of 8
 $ data       :'data.frame':    0 obs. of  0 variables
 $ layers     :List of 1
  ..$ :proto object 
 .. .. $ legend     : logi NA 
 .. .. $ inherit.aes: logi TRUE 
...
> class(p$layers[[1]])
[1] "proto"       "environment"
> is.environment(p$layers[[1]])
[1] TRUE

Notice how it's constructed using proto and is containing many environments as a result. You can also plot the relationships in these objects using graph.proto.

Community
  • 1
  • 1
Shane
  • 98,550
  • 35
  • 224
  • 217
  • I'm hoping to understand them so that I can use them potentially. I'm somewhat familiar with the scoping rules in R and with most of the functions that you have listed, but I will explore there details in more depth. Thanks for the info. – ramhiser Jul 16 '10 at 18:37
  • 3
    Completely agree, Shane! It's important to understand environments and scoping in R if you're building any significant amount of code! But that doesn't necessarily imply you should use environments as data structures. – Harlan Jul 16 '10 at 18:46
  • 2
    @Harlan: I completely agree. Maybe I should be more forceful on that front. @John: Don't use environments unless you (1) understand them and (2) have a good reason to do so. A list is generally a better option. IMO, it's a best practice to avoid side-effects unless you absolutely can't! – Shane Jul 16 '10 at 18:50
  • Thanks for this answer Shane, I noticed that it hasn't been accepted so I'll take that to mean that I can be so bold as to ask for more? I have a dataset that was saved inside a new environment, named e.g. dta.env, so I did ls(dta.env), but it just returned a list of the data tickers. Is there a way to explore the environment in greater depth? to get a more thorough list of what's inside? (not sure if this qualifies as a new question altogether) If there's more to ls(), please would you expand on that? Thanks anyhows. – PatrickT Mar 23 '13 at 13:46
  • To understand environments, you also need to know that [every environment has a parent environment](http://adv-r.had.co.nz/Environments.html), and names from the parent scope will be found by some operations on the environment (e.g., `exists`). To create an environment without a parent you [give it the empty environment](https://stackoverflow.com/a/42350672/1048186): `new.env(parent = emptyenv())` – Josiah Yoder Jul 01 '21 at 17:16
6

Well, if you don't understand them, and the people you might someday have to read your code (including your future self) don't understand environments, then you shouldn't use them! They were designed to be used to encapsulate name spaces in packages and such. The fact that you can use them for pass-by-reference and hash tables doesn't necessarily mean you should. It's a trick. Generally, use of deep magic is not really advisable, even if it makes your code a little faster.

Harlan
  • 18,883
  • 8
  • 47
  • 56
  • 1
    So whenever I come across a new trick, I should avoid it because I don't understand it? Often I perform operations on large covariance matrices by passing them from function to function. Would using environments improve performance in this situation enough to warrant using them? – ramhiser Jul 16 '10 at 18:17
  • I'm not entirely sure of the implementation details, but I believe that if you don't modify the large matrices within the functions, they're not actually copied. As to your larger question, I'd advise that if you need the speed, it may be worth learning the wizardry, just keep in mind that it's a (mild) abuse of the languages semantics to do so, and that you may regret it later. Or, you may not regret it! – Harlan Jul 16 '10 at 18:44
  • 7
    +1 To touch on Harlan's concerns: yes, this is a dangerous usage because it introduces "side-effects". Whenever you allow a function to alter the outside world, you are opening yourself up to unexpected behavior. http://en.wikipedia.org/wiki/Side_effect_(computer_science) – Shane Jul 16 '10 at 18:47
  • +1 to Shane's comment. In this increasingly parallelized world it is good practice to start cutting back on uses of side effects. – Sharpie Jul 17 '10 at 14:12