41

This post (Lazy evaluation in R – is assign affected?) covers some common ground but I am not sure it answers my question.

I stopped using assign when I discovered the apply family quite a while back, albeit, purely for reasons of elegance in situations such as this:

names.foo <- letters
values.foo <- LETTERS
for (i in 1:length(names.foo))
  assign(names.foo[i], paste("This is: ", values.foo[i]))

which can be replaced by:

foo <- lapply(X=values.foo, FUN=function (k) paste("This is :", k))
names(foo) <- names.foo

This is also the reason this (http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f) R-faq says this should be avoided.

Now, I know that assign is generally frowned upon. But are there other reasons I don't know? I suspect it may mess with the scoping or lazy evaluation but I am not sure? Example code that demonstrates such problems will be great.

Community
  • 1
  • 1
asb
  • 4,392
  • 1
  • 20
  • 30

3 Answers3

38

Actually those two operations are quite different. The first gives you 26 different objects while the second gives you only one. The second object will be a lot easier to use in analyses. So I guess I would say you have already demonstrated the major downside of assign, namely the necessity of then needing always to use get for corralling or gathering up all the similarly named individual objects that are now "loose" in the global environment. Try imagining how you would serially do anything with those 26 separate objects. A simple lapply(foo, func) will suffice for the second strategy.

That FAQ citation really only says that using assignment and then assigning names is easier, but did not imply it was "bad". I happen to read it as "less functional" since you are not actually returning a value that gets assigned. The effect looks to be a side-effect (and in this case the assign strategy results in 26 separate side-effects). The use of assign seems to be adopted by people that are coming from languages that have global variables as a way of avoiding picking up the "True R Way", i.e. functional programming with data-objects. They really should be learning to use lists rather than littering their workspace with individually-named items.

There is another assignment paradigm that can be used:

 foo <- setNames(  paste0(letters,1:26),  LETTERS)

That creates a named atomic vector rather than a named list, but the access to values in the vector is still done with names given to [.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • In my understanding setNames is just a hidden `names<-`. which is another pet peeve of mine! Is there any particular advantage above `names`? – asb Jul 09 '13 at 22:58
  • Why should it be a "pet peeve"? It is easy to read and some people prefer the more compact code. My pet peeve is using loops when they are not needed. – IRTFM Jul 09 '13 at 23:03
  • @asb how about `sapply(X=values.foo, FUN=function (k) paste("This is :", k), simplify=FALSE)` to avoid your `names<-` call ... or to hide it differently, I guess. – GSee Jul 09 '13 at 23:04
  • @DWin: I meant having to use `names(foo) <- bar` is a pet peeve. Also, because it tends to make copies at times, especially for data.frames. And I would rather like to know a sure shot way to avoid making those copies. – asb Jul 09 '13 at 23:10
  • @GSee: Interesting. In my mind `sapply` and `lapply` have always been for different use cases unless I deliberately want to use `sapply` to fail at `simplify2array`. – asb Jul 09 '13 at 23:12
  • 10
    +1 -- another important point in my opinion is when writing functions. A function can only return one object, so a list becomes a handy wrapper for returning multiple objects. Without lists, you would have to make the function `assign` variables to the parent environment, i.e. have side effects. That would be very frowned upon. – flodel Jul 09 '13 at 23:23
  • 3
    Doesn't the apply functions have a loop inside their definition. A common reflex is to use a function in the apply family. This is not vector- ization, it is loop-hiding. The apply function has a for loop in its denition. "The lapply function buries the loop, but execution times tend to be roughly equal to an explicit for loop." The R-Inferno Circle 4 (Over vectorizing) - [link](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf) – marbel Jul 10 '13 at 00:06
  • 1
    @MartínBel: +1 for the R-inferno! However, I don't completely agree. Perhaps, this should be my next question after I revisit Circle 4 from R inferno. While `apply` is loop-hiding, `lapply` and `vapply` are in fact `Internal` functions in R. I am guessing they have some optimization. OTOH, loop hiding and functional idioms can improve density and readability and are good too, me thinks. – asb Jul 10 '13 at 00:17
  • 1
    That's true, he isn't completely against the apply functions. He says the following: Use an explicit for loop when each iteration is a non-trivial task. But a simple loop can be more clearly and compactly expressed using an apply function. There is at least one exception to this rule. – marbel Jul 10 '13 at 00:44
  • @asb Exception: However, with larger problems this could easily eat all memory on a machine. Suppose we have a data frame and we want to change the missing values to zero. Then we can do that in a perfectly vectorized manner: `x[is.na(x)] <- 0` But if x is large, then this may take a lot of memory. If (as is common) the number of rows is much larger than the number of columns, then a more memory ecient method is: `for(i in 1:ncol(x)) x[is.na(x[,i]), i] <- 0` – marbel Jul 10 '13 at 00:45
  • 1
    @MartínBel: Somebody's been thorough with R-Inferno! ;) – asb Jul 10 '13 at 00:51
  • 1
    @MartínBel: the compromise in both runtime and memory consumption would be to work in chunks instead of element-wise. On a second thought, I suspect you wont get very far with your data analysis anyways if you don't have memory for a single further copy of `x`. – cbeleites unhappy with SX Jul 10 '13 at 06:19
15

As the source of fortune(236) I thought I would add a couple examples (also see fortune(174)).

First, a quiz. Consider the following code:

x <- 1
y <- some.function.that.uses.assign(rnorm(100))

After running the above 2 lines of code, what is the value of x?

The assign function is used to commit "Action at a distance" (see http://en.wikipedia.org/wiki/Action_at_a_distance_(computer_programming) or google for it). This is often the source of hard to find bugs.

I think the biggest problem with assign is that it tends to lead people down paths of thinking that take them away from better options. A simple example is the 2 sets of code in the question. The lapply solution is more elegant and should be promoted, but the mere fact that people learn about the assign function leads people to the loop option. Then they decide that they need to do the same operation on each object created in the loop (which would be just another simple lapply or sapply if the elegant solution were used) and resort to an even more complicated loop involving both get and apply along with ugly calls to paste. Then those enamored with assign try to do something like:

curname <- paste('myvector[', i, ']')
assign(curname, i)

And that does not do quite what they expected which leads to either complaining about R (which is as fair as complaining that my next door neighbor's house is too far away because I chose to walk the long way around the block) or even worse, delve into using eval and parse to get their constructed string to "work" (which then leads to fortune(106) and fortune(181)).

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
8

I'd like to point out that assign is meant to be used with environments.

From that point of view, the "bad" thing in the example above is using a not quite appropriate data structure (the base environment instead of a list or data.frame, vector, ...).

Side note: also for environments, the $ and $<- operators work, so in many cases the explicit assign and get isn't necessary there, neither.

cbeleites unhappy with SX
  • 13,717
  • 5
  • 45
  • 57
  • 5
    And the usual reason they resort to `assign` is to get a constructed variable name which does not work with `$<-`. So we should also note that `[[<-` "works" with environments, so one could do: `myEnv[[paste0("my", "Var", 1)]] <- value` – IRTFM Jul 10 '13 at 21:19