13

I often see questions from novice R programmers where they've used assign to create multiple objects, and then run into trouble trying to manipulate those objects for a subsequent task (a recent example).

assign appeals to novice users because it has dynamic properties (programmatically creating variable names, in addition to the variable's values), and seems to mimic some properties of global assignment. Its straightforward name also makes it likely to show up in searches for a variety of problem types.

Of course, more experienced R programmers come to realize that assign creates code that is hard to read, fragile to maintain, and acts via the type of side effects that are otherwise staunchly avoided in the highly functional R language.

Every question I've seen on SO where the OP initially used assign ultimately has a better alternative in the correct use of named vectors, lists, or data frames. The resulting code is easier to follow, more robust to change, and often more performant.

All this is to say, it's easy to find examples of why assign is bad. My question is: in what situations would the use of assign be the appropriate, preferred, or only solution?

jdobres
  • 11,339
  • 1
  • 17
  • 37

1 Answers1

7

If you were constructing a program that mediated a dialogue with a user wherein the user was asked to input an arbitrary object name (in the specific R sense of an unquoted string that that is listed in a particular namespace), you might consider using assign.

The option to assign to a particular environment may also have value. Notice how it is used in the ecdf function:

ecdf
#----screen output----
function (x) 
{
    x <- sort(x)
    n <- length(x)
    if (n < 1) 
        stop("'x' must have 1 or more non-missing values")
    vals <- unique(x)
    rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/n, 
        method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")
    class(rval) <- c("ecdf", "stepfun", class(rval))
    assign("nobs", n, envir = environment(rval))
    attr(rval, "call") <- sys.call()
    rval
}
<bytecode: 0x7c77cc0>
<environment: namespace:stats>

The ecdf function takes data and returns another function. Most of that function is built with a C call by approxfun, but as a last feature, the ecdf function adds an element to the environment of the returned value (which is yet another function.)

I'm sure you could find other instances where assign is used in the R code of the base and stats packages. Those are arguably "R Core Certified^({TM)}" examples of "proper" uses.

When I followed my own advice I got this from a bash operation:

$ cd '/home/david/Downloads/R-3.5.2/src/library/base/R/' 
$ grep -R "assign" 
# --- results with a recent download of the R sources -----
userhooks.R:        assign(hookName, new, envir = .userHooksEnv, inherits = FALSE)
datetime.R:    cacheIt <- function(tz) assign(".sys.timezone", tz, baseenv())
autoload.R: assign(".Autoloaded", c(package, .Autoloaded), envir =.AutoloadEnv)
lazyload.R:    ## set <- function (x,  value,  env) .Internal(assign(x,  value,  env,  FALSE))
delay.R:    function(x, value, eval.env=parent.frame(1), assign.env=parent.frame(1))
delay.R:    .Internal(delayedAssign(x, substitute(value), eval.env, assign.env))
assign.R:#  File src/library/base/R/assign.R
assign.R:assign <-
assign.R:    .Internal(assign(x, value, envir, inherits))
# stripped out some occurences of "assighnment" 
# stripped out the occurrences of "assign" in the namespace functions
zzz.R:assign("%*%", function(x, y) NULL, envir = .ArgsEnv)
zzz.R:assign("...length", function() NULL, envir = .ArgsEnv)
zzz.R:assign("...elt", function(n) NULL, envir = .ArgsEnv)
zzz.R:assign(".C", function(.NAME, ..., NAOK = FALSE, DUP = TRUE, PACKAGE,
zzz.R:assign(".Fortran",
zzz.R:assign(".Call", function(.NAME, ..., PACKAGE) NULL, envir = .ArgsEnv)
zzz.R:assign(".Call.graphics", function(.NAME, ..., PACKAGE) NULL, envir = .ArgsEnv)
zzz.R:assign(".External", function(.NAME, ..., PACKAGE) NULL, envir = .ArgsEnv)
zzz.R:assign(".External2", function(.NAME, ..., PACKAGE) NULL, envir = .ArgsEnv)
zzz.R:assign(".External.graphics", function(.NAME, ..., PACKAGE) NULL,
zzz.R:assign(".Internal", function(call) NULL, envir = .ArgsEnv)
zzz.R:assign(".Primitive", function(name) NULL, envir = .ArgsEnv)
zzz.R:assign(".isMethodsDispatchOn", function(onOff = NULL) NULL, envir = .ArgsEnv)
zzz.R:assign(".primTrace", function(obj) NULL, envir = .ArgsEnv)
zzz.R:assign(".primUntrace", function(obj) NULL, envir = .ArgsEnv)
zzz.R:assign(".subset", function(x, ...) NULL, envir = .ArgsEnv)
zzz.R:assign(".subset2", function(x, ...) NULL, envir = .ArgsEnv)
zzz.R:assign("UseMethod", function(generic, object) NULL, envir = .ArgsEnv)
zzz.R:assign("as.call", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("attr", function(x, which, exact = FALSE) NULL, envir = .ArgsEnv)
zzz.R:assign("attr<-", function(x, which, value) NULL, envir = .ArgsEnv)
zzz.R:assign("attributes", function(obj) NULL, envir = .ArgsEnv)
zzz.R:assign("attributes<-", function(obj, value) NULL, envir = .ArgsEnv)
zzz.R:assign("baseenv", function() NULL, envir = .ArgsEnv)
zzz.R:assign("browser",
zzz.R:assign("call", function(name, ...) NULL, envir = .ArgsEnv)
zzz.R:assign("class", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("class<-", function(x, value) NULL, envir = .ArgsEnv)
zzz.R:assign(".cache_class", function(class, extends) NULL, envir = .ArgsEnv)
zzz.R:assign("emptyenv", function() NULL, envir = .ArgsEnv)
zzz.R:assign("enc2native", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("enc2utf8", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("environment<-", function(fun, value) NULL, envir = .ArgsEnv)
zzz.R:assign("expression", function(...) NULL, envir = .ArgsEnv)
zzz.R:assign("forceAndCall", function(n, FUN, ...) NULL, envir = .ArgsEnv)
zzz.R:assign("gc.time", function(on = TRUE) NULL, envir = .ArgsEnv)
zzz.R:assign("globalenv", function() NULL, envir = .ArgsEnv)
zzz.R:assign("interactive", function() NULL, envir = .ArgsEnv)
zzz.R:assign("invisible", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.atomic", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.call", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.character", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.complex", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.double", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.environment", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.expression", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.function", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.integer", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.language", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.list", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.logical", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.name", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.null", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.object", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.pairlist", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.raw", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.recursive", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.single", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("is.symbol", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("isS4", function(object) NULL, envir = .ArgsEnv)
zzz.R:assign("list", function(...) NULL, envir = .ArgsEnv)
zzz.R:assign("lazyLoadDBfetch", function(key, file, compressed, hook) NULL,
zzz.R:assign("missing", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("nargs", function() NULL, envir = .ArgsEnv)
zzz.R:assign("nzchar", function(x, keepNA=FALSE) NULL, envir = .ArgsEnv)
zzz.R:assign("oldClass", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("oldClass<-", function(x, value) NULL, envir = .ArgsEnv)
zzz.R:assign("on.exit", function(expr = NULL, add = FALSE, after = TRUE) NULL, envir = .ArgsEnv)
zzz.R:assign("pos.to.env", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("proc.time", function() NULL, envir = .ArgsEnv)
zzz.R:assign("quote", function(expr) NULL, envir = .ArgsEnv)
zzz.R:assign("retracemem", function(x, previous = NULL) NULL, envir = .ArgsEnv)
zzz.R:assign("seq_along", function(along.with) NULL, envir = .ArgsEnv)
zzz.R:assign("seq_len", function(length.out) NULL, envir = .ArgsEnv)
zzz.R:assign("standardGeneric", function(f, fdef) NULL, envir = .ArgsEnv)
zzz.R:assign("storage.mode<-", function(x, value) NULL, envir = .ArgsEnv)
zzz.R:assign("substitute", function(expr, env) NULL, envir = .ArgsEnv)
zzz.R:assign("switch", function(EXPR, ...) NULL, envir = .ArgsEnv)
zzz.R:assign("tracemem", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("unclass", function(x) NULL, envir = .ArgsEnv)
zzz.R:assign("untracemem", function(x) NULL, envir = .ArgsEnv)
zzz.R:     assign(f, fx, envir = env)  # grep fails to include the names of these
zzz.R:        assign(f, fx, envir = env)
zzz.R:        assign(f, fx, envir = env)
zzz.R:        assign(f, fx, envir = env)
zzz.R:        assign(f, fx, envir = env)
zzz.R:    assign("anyNA", fx, envir = env)
zzz.R:assign("!", function(x) UseMethod("!"), envir = .GenericArgsEnv)
zzz.R:assign("as.character", function(x, ...) UseMethod("as.character"),
zzz.R:assign("as.complex", function(x, ...) UseMethod("as.complex"),
zzz.R:assign("as.double", function(x, ...) UseMethod("as.double"),
zzz.R:assign("as.integer", function(x, ...) UseMethod("as.integer"),
zzz.R:assign("as.logical", function(x, ...) UseMethod("as.logical"),
zzz.R:#assign("as.raw", function(x) UseMethod("as.raw"), envir = .GenericArgsEnv)
zzz.R:## assign("c", function(..., recursive = FALSE, use.names = TRUE) UseMethod("c"),
zzz.R:assign("c", function(...) UseMethod("c"),
zzz.R:#assign("dimnames", function(x) UseMethod("dimnames"), envir = .GenericArgsEnv)
zzz.R:assign("dim<-", function(x, value) UseMethod("dim<-"), envir = .GenericArgsEnv)
zzz.R:assign("dimnames<-", function(x, value) UseMethod("dimnames<-"),
zzz.R:assign("length<-", function(x, value) UseMethod("length<-"),
zzz.R:assign("levels<-", function(x, value) UseMethod("levels<-"),
zzz.R:assign("log", function(x, base=exp(1)) UseMethod("log"),
zzz.R:assign("names<-", function(x, value) UseMethod("names<-"),
zzz.R:assign("rep", function(x, ...) UseMethod("rep"), envir = .GenericArgsEnv)
zzz.R:assign("round", function(x, digits=0) UseMethod("round"),
zzz.R:assign("seq.int", function(from, to, by, length.out, along.with, ...)
zzz.R:assign("signif", function(x, digits=6) UseMethod("signif"),
zzz.R:assign("trunc", function(x, ...) UseMethod("trunc"), envir = .GenericArgsEnv)
zzz.R:#assign("xtfrm", function(x) UseMethod("xtfrm"), envir = .GenericArgsEnv)
zzz.R:assign("as.numeric", get("as.double", envir = .GenericArgsEnv),
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Don't agree with your first paragraph: I'd use a list there too. The rest is spot on. – Roland Jan 06 '19 at 18:57
  • 1
    The result of a user-entered string is usually a character. If you are trying to create an object with that name (in the R sense of an unquoted value), you need to do something that doesn't use `<-` unless you try to use `eval(parse(text=` and that'e probable even less readable and maintainable. – IRTFM Jan 06 '19 at 19:13
  • `ecdf()` is an interesting example. It looks like `assign` is used to place the value of `nobs` into the `rval` environment, `rval` being a dynamic function created from `approxfun`. I'm hard-pressed to think of an alternate way to place a value into a dynamic function, other than refactoring the entire thing to be list-based, which wouldn't be quite the same. So the short answer to "When must you use `assign`?" likely be: "true dynamic programming, which most R users never do directly". – jdobres Jan 06 '19 at 19:14
  • The other valid use appears to be constructing other "S3 assignment functions" with "suffixes" of `"...<-"`. Also building "dot functions" and a bunch of `is.*` functions. – IRTFM Jan 06 '19 at 19:22
  • @jdobres It's interesting that you use the term "dynamic programming" since most occurrences of that term in questions on SO that I've seen do apply to the paragraph that you are disagreeing with more so than they apply to the parts you are endorsing . That said, I have always wondered whether "dynamic" had any agreed-upon meaning when applied to programming. It has seemed to be used to indicate a wide variety of paradigms. – IRTFM Jan 06 '19 at 19:31
  • 1
    Entirely possible that I'm being a little "dynamic" with my terminology. ;) – jdobres Jan 06 '19 at 19:32
  • 3
    its interesting to note that all the examples in `?assign` demonstrate what we could say are bad practice. Perhaps the help should be rewritten demonstrating some good examples – dww Jan 06 '19 at 19:33
  • @42- `userinputlist[[objectname]] <- objectvalue`, but I wouldn't even let the user define the object name that is used internally. – Roland Jan 07 '19 at 08:04