When to use missing versus NULL values for passing undefined function arguments in R, and why?

Question

To date when writing R functions I've passed undefined arguments as NULL values and then tested whether they are NULL i.e.

f1 <- function (x = NULL) {
   if(is.null(x))
      ...
}

However I recently discovered the possibility of passing undefined arguments as missing i.e.

f2 <- function (x) {
   if(missing(x))
      ...
}

The R documentation states that

Currently missing can only be used in the immediate body of the function that defines the argument, not in the body of a nested function or a local call. This may change in the future.

Clearly this is one disadvantage of using missing to determine undefined values are there any others people or aware of? Or to phrase the question in a more useful form "When do you use missing versus NULL values for passing undefined function arguments in R and why?"

I guess a case where `NULL` could be really convenient is when an argument in a function can just be ignored or manipulated as of length == 0. E.g. in `f1 = function(x, y = NULL) c(x, y)`, calling `f1(3)` won't produce an error (as it would if `NULL` was absent) and doesn't need extra checking. — alexis_laz, Feb 25 '14 at 19:46

eddi · Answer 1 · 2014-02-26T18:08:43.020

11

NULL is just another value you can assign to a variable. It's no different than any other default value you'd assign in your function's declaration.

missing on the other hand checks if the user supplied that argument, which you can do before the default assignment - which thanks to R's lazy evaluation only happens when that variable is used.

A couple of examples of what you can achieve with this are: arguments with no default value that you can still omit - e.g. file and text in read.table, or arguments with default values where you can only specify one - e.g. n and nmax in scan.

You'll find many other use cases by browsing through R code.

edited Feb 26 '14 at 18:08

answered Feb 25 '14 at 19:58

eddi

49,088
6
104
155

In R 3.0.2 `text` appears to be the only argument in `scan` that does not have a default value. Do you know if there is a particular reason for this? – joethorley Feb 26 '14 at 01:49
1

This is the germ of a great answer, but you need to expand on it: *use missing rather than null when you have multiple arguments indicating different/conflicting modes or use cases (e.g. `read.csv(file/text` ) and your default activity has to choose between one of those cases* – smci Apr 27 '18 at 23:58

score 8 · Accepted Answer · answered Feb 25 '14 at 20:04

missing(x) seems to be a bit faster than using default arg to x equal to NULL.

> require('microbenchmark')
> f1 <- function(x=NULL) is.null(x)
> f2 <- function(x) missing(x)

> microbenchmark(f1(1), f2(1))
Unit: nanoseconds
  expr min  lq median    uq  max neval
 f1(1) 615 631  647.5 800.5 3024   100
 f2(1) 497 511  567.0 755.5 7916   100

> microbenchmark(f1(), f2())
Unit: nanoseconds
 expr min  lq median    uq  max neval
 f1() 589 619    627 745.5 3561   100
 f2() 437 448    463 479.0 2869   100

Note that in the f1 case x is still reported as missing if you make a call f1(), but it has a value that may be read within f1.

The second case is more general than the first one. missing() just means that the user did not pass any value. is.null() (with NULL default arg) states that the user either did not pass anything or he/she passed NULL.

By the way, plot.default() and chisq.test() use NULL for their second arguments. On the other hand, getS3method('t.test', 'default') uses NULL for y argument and missing() for mu (in order to be prepared for many usage scenarios).

I think that some R users will prefer f1-type functions, especially when working with the *apply family:

sapply(list(1, NULL, 2, NULL), f1)

Achieving that in the f2 case is not so straightforward.

I didn't realize that in the `f1` case `x` is still reported as missing if you make the call `f1()`. It does seem that in general setting default values is most useful - also it makes it explicit to users that they don't have to supply values. — joethorley, Feb 25 '14 at 23:45

score 1 · Answer 3 · answered Sep 30 '19 at 14:00

In my opinion, it is not clear when the limitation to missing applies. The documentation, as you quote, says that missing can only be used in the immediate body of the function. A simple example, though, shows that that is not the case and that it works as expected when the arguments are passed to a nested function.

f1 = function(x, y, z){
  if(!missing(x))
    print(x)
  if(!missing(y))
    print(y)
}

f2 = function(x, y, z){
  if(!missing(z)) print(z)
  f1(x, y)
}
f1(y="2")
#> [1] "2"
f2(y="2", z="3")
#> [1] "3"
#> [1] "2"
f2(x="1", z="3")
#> [1] "3"
#> [1] "1"

I would like to see an example of a case when missing does not work in a nested function.

^{Created on 2019-09-30 by the reprex package (v0.2.1)}

That note (which is in `?missing`) was written in 2002. Looks like this *did* change sometime in the last 17 years, but the docs didn't. — user2554330, Sep 30 '19 at 14:11
Or maybe the quote is about nested functions, i.e. something like `f2 <- function(z){ f1 <- function(){ if(missing(z)) cat("f1: z is missing\n") }; f1() }` where `f2()` generates an error. — user2554330, Oct 01 '19 at 09:23

When to use missing versus NULL values for passing undefined function arguments in R, and why?

3 Answers3

Linked