223

I am interested in what is the "correct" way to write functions with optional arguments in R. Over time, I stumbled upon a few pieces of code that take a different route here, and I couldn't find a proper (official) position on this topic.

Up until now, I have written optional arguments like this:

fooBar <- function(x,y=NULL){
  if(!is.null(y)) x <- x+y
  return(x)
}
fooBar(3) # 3
fooBar(3,1.5) # 4.5

The function simply returns its argument if only x is supplied. It uses a default NULL value for the second argument and if that argument happens to be not NULL, then the function adds the two numbers.

Alternatively, one could write the function like this (where the second argument needs to be specified by name, but one could also unlist(z) or define z <- sum(...) instead):

fooBar <- function(x,...){
  z <- list(...)
  if(!is.null(z$y)) x <- x+z$y
  return(x)
}
fooBar(3) # 3
fooBar(3,y=1.5) # 4.5

Personally I prefer the first version. However, I can see good and bad with both. The first version is a little less prone to error, but the second one could be used to incorporate an arbitrary number of optionals.

Is there a "correct" way to specify optional arguments in R? So far, I have settled on the first approach, but both can occasionally feel a bit "hacky".

SimonG
  • 4,701
  • 3
  • 20
  • 31
  • 1
    Check out the source code for `xy.coords` to see a commonly used approach. – Carl Witthoft Feb 06 '15 at 19:39
  • 11
    The source code for `xy.coords` mentioned by [Carl Witthoft](http://stackoverflow.com/users/884372/carl-witthoft)l can be found at [xy.coords](https://github.com/wch/r-source/blob/af7f52f70101960861e5d995d3a4bec010bc89e6/src/library/grDevices/R/xyz.coords.R#L21-L126) – RubenLaguna Oct 30 '16 at 09:43

7 Answers7

177

You could also use missing() to test whether or not the argument y was supplied:

fooBar <- function(x,y){
    if(missing(y)) {
        x
    } else {
        x + y
    }
}

fooBar(3,1.5)
# [1] 4.5
fooBar(3)
# [1] 3
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 6
    I like missing better. especially if you have a lot of NULL default values, you won't have x = NULL, y = NULL, z = NULL in your package documentation – rawr Feb 06 '15 at 16:29
  • Indeed this seems like the "intended" way of doing it. – SimonG Feb 06 '15 at 16:30
  • 5
    @rawr `missing()` is also more expressive in the sense that it "says what it means". Plus it allows users to pass in a value of NULL, in places where that makes sense! – Josh O'Brien Feb 06 '15 at 16:32
  • 51
    To me, there's big downside to using missing in this way: when skimming the function arguments you can no longer see which arguments are required and which are options. – hadley Feb 07 '15 at 13:41
  • 3
    `@param x numeric; something something; @param y numeric; **optional** something something; @param z logical; **optional** something something` – rawr Feb 07 '15 at 18:21
  • 1
    `missing()` is useless when you want to use proper checking of supplied parameters against a list though. For a function `Foo` with parameter `bar` and optional switch `a_or_b` (default value "a") you can write `Foo <- function(bar,a_or_b=c("a","b")) { if (length(a_or_b)>1) {a_or_b = "a"} print(missing(a_or_b)) return() }` Length checking here is because by default the list value is passed as the value of the parameter. For longer list values than "a" you could just refer more systematically by index, `a_or_b[1]` – Louis Maddox May 01 '15 at 04:17
  • 7
    `missing()` is terrible when you want to be passing arguments from one function to another. – John Smith Sep 15 '16 at 20:15
  • @JohnSmith Mind giving a simple example? I'm genuinely curious to see what you're referring to. Does it involve situations that involve a `...` formal argument? Here's an example of where it seems to work just fine: `j <- function(x) {if(missing(x)) "hah" else x}; f <- function(X) j(x=X); f(99); f()` – Josh O'Brien Sep 16 '16 at 00:12
  • @JoshO'Brien `a <- function(x = NULL) {return(b(x))};b <- function(x = NULL) {return(missing(x))};a();a(1);a(NULL);b();b(NULL)` – John Smith Sep 17 '16 at 21:07
  • @JoshO'Brien or `j <- function(x) {return(missing(x))}; f <- function(x) {j(2*x)}; f(99); f()` with `j <- function(x) {if(missing(x)) "hah" else x}; f <- function(X) j(x=2*X); f(99); f()` giving error due to 'wrong' if. – John Smith Sep 17 '16 at 21:20
  • @LouisMaddox Could you please elaborate on your comment about proper checking of supplied parameters against a list? I asked a specific question about it https://stackoverflow.com/questions/41324398/r-proper-checking-of-supplied-parameters-against-a-list-of-values – green diod Dec 25 '16 at 20:49
  • I found a problem with this solution because the missing variable y has marked forever as "missing" and you cannot build it again. Example fooBar <- function(x,y){if(missing(y)) {y = get0("y", inherits = TRUE) x + y} else { x + y }};y <- 1;fooBar(x = 2) – Captain Tyler Jan 15 '19 at 16:49
76

To be honest I like the OP's first way of actually starting it with a NULL value and then checking it with is.null (primarily because it is very simply and easy to understand). It maybe depends on the way people are used to coding but the Hadley seems to support the is.null way too:

From Hadley's book "Advanced-R" Chapter 6, Functions, p.84 (for the online version check here):

You can determine if an argument was supplied or not with the missing() function.

i <- function(a, b) {
  c(missing(a), missing(b))
}
i()
#> [1] TRUE TRUE
i(a = 1)
#> [1] FALSE  TRUE
i(b = 2)
#> [1]  TRUE FALSE
i(1, 2)
#> [1] FALSE FALSE

Sometimes you want to add a non-trivial default value, which might take several lines of code to compute. Instead of inserting that code in the function definition, you could use missing() to conditionally compute it if needed. However, this makes it hard to know which arguments are required and which are optional without carefully reading the documentation. Instead, I usually set the default value to NULL and use is.null() to check if the argument was supplied.

LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • 2
    Interesting. That sounds reasonable, but do you ever find yourself perplexed about which arguments to a function are required and which are optional? I'm not sure that I've *ever* actually had that experience... – Josh O'Brien Feb 06 '15 at 16:47
  • 2
    @JoshO'Brien I think I haven't had that problem with either coding style to be honest, at least it never was a major issue probably because of the documentation or reading the source code. And that is why I primarily say that it really is a matter of the coding style you are used to. I have been using the `NULL` way for quite a while and probably that is why I am more used to it when I see source codes. It seems more natural to me. That said, as you say base R takes both approaches so, it really comes down to individual preferences. – LyzandeR Feb 06 '15 at 16:55
  • 2
    By now, I really wish I could mark two answers as correct because what I really arrived at using both `is.null` and `missing` depending on the context and what the argument is used for. – SimonG Nov 05 '15 at 10:20
  • 5
    That 's ok @SimonG and thanks :). I agree that both answers are very good and they do depend on context sometimes. This is a very good question and I believe the answers provide very good information and knowledge which is the primary goal here anyway. – LyzandeR Nov 05 '15 at 10:56
32

These are my rules of thumb:

If default values can be calculated from other parameters, use default expressions as in:

fun <- function(x,levels=levels(x)){
    blah blah blah
}

if otherwise using missing

fun <- function(x,levels){
    if(missing(levels)){
        [calculate levels here]
    }
    blah blah blah
}

In the rare case that you thing a user may want to specify a default value that lasts an entire R session, use getOption

fun <- function(x,y=getOption('fun.y','initialDefault')){# or getOption('pkg.fun.y',defaultValue)
    blah blah blah
}

If some parameters apply depending on the class of the first argument, use an S3 generic:

fun <- function(...)
    UseMethod(...)


fun.character <- function(x,y,z){# y and z only apply when x is character
   blah blah blah 
}

fun.numeric <- function(x,a,b){# a and b only apply when x is numeric
   blah blah blah 
}

fun.default <- function(x,m,n){# otherwise arguments m and n apply
   blah blah blah 
}

Use ... only when you are passing additional parameters on to another function

cat0 <- function(...)
    cat(...,sep = '')

Finally, if you do choose the use ... without passing the dots onto another function, warn the user that your function is ignoring any unused parameters since it can be very confusing otherwise:

fun <- (x,...){
    params <- list(...)
    optionalParamNames <- letters
    unusedParams <- setdiff(names(params),optionalParamNames)
    if(length(unusedParams))
        stop('unused parameters',paste(unusedParams,collapse = ', '))
   blah blah blah 
}
Jthorpe
  • 9,756
  • 2
  • 49
  • 64
  • the s3 method option was one of the first things that came to mind for me, too – rawr Feb 06 '15 at 17:22
  • 2
    In retrospect, I have become fond of the OP's method of assigning `NULL` in the function signature, as it more convenient for making functions that [chain](https://en.wikipedia.org/wiki/Method_chaining) nicely. – Jthorpe Jan 25 '16 at 22:22
11

There are several options and none of them are the official correct way and none of them are really incorrect, though they can convey different information to the computer and to others reading your code.

For the given example I think the clearest option would be to supply an identity default value, in this case do something like:

fooBar <- function(x, y=0) {
  x + y
}

This is the shortest of the options shown so far and shortness can help readability (and sometimes even speed in execution). It is clear that what is being returned is the sum of x and y and you can see that y is not given a value that it will be 0 which when added to x will just result in x. Obviously if something more complicated than addition is used then a different identity value will be needed (if one exists).

One thing I really like about this approach is that it is clear what the default value is when using the args function, or even looking at the help file (you don't need to scroll down to the details, it is right there in the usage).

The drawback to this method is when the default value is complex (requiring multiple lines of code), then it would probably reduce readability to try to put all that into the default value and the missing or NULL approaches become much more reasonable.

Some of the other differences between the methods will appear when the parameter is being passed down to another function, or when using the match.call or sys.call functions.

So I guess the "correct" method depends on what you plan to do with that particular argument and what information you want to convey to readers of your code.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
9

Just wanted to point out that the built-in sink function has good examples of different ways to set arguments in a function:

> sink
function (file = NULL, append = FALSE, type = c("output", "message"),
    split = FALSE)
{
    type <- match.arg(type)
    if (type == "message") {
        if (is.null(file))
            file <- stderr()
        else if (!inherits(file, "connection") || !isOpen(file))
            stop("'file' must be NULL or an already open connection")
        if (split)
            stop("cannot split the message connection")
        .Internal(sink(file, FALSE, TRUE, FALSE))
    }
    else {
        closeOnExit <- FALSE
        if (is.null(file))
            file <- -1L
        else if (is.character(file)) {
            file <- file(file, ifelse(append, "a", "w"))
            closeOnExit <- TRUE
        }
        else if (!inherits(file, "connection"))
            stop("'file' must be NULL, a connection or a character string")
        .Internal(sink(file, closeOnExit, FALSE, split))
    }
}
user5359531
  • 3,217
  • 6
  • 30
  • 55
8

I would tend to prefer using NULL for the clarity of what is required and what is optional. One word of warning about using default values that depend on other arguments, as suggested by Jthorpe. The value is not set when the function is called, but when the argument is first referenced! For instance:

foo <- function(x,y=length(x)){
    x <- x[1:10]
    print(y)
}
foo(1:20) 
#[1] 10

On the other hand, if you reference y before changing x:

foo <- function(x,y=length(x)){
    print(y)
    x <- x[1:10]
}
foo(1:20) 
#[1] 20

This is a bit dangerous, because it makes it hard to keep track of what "y" is being initialized as if it's not called early on in the function.

2

how about this?

fun <- function(x, ...){
  y=NULL
  parms=list(...)
  for (name in names(parms) ) {
    assign(name, parms[[name]])
  }
  print(is.null(y))
}

Then try:

> fun(1,y=4)
[1] FALSE
> fun(1)
[1] TRUE
Keyu Nie
  • 41
  • 1
  • 1
  • 4