10

Some functions like browser only make sense when used interactively.

It is widely regarded that the subset function should only be used interactively.

Similarly, sapply isn't good for programmatic use since it doesn't simplify the result for zero length inputs.

I'm trying to make an exhaustive list of functions that are only not suitable for programmatic use.

The plan is to make a tool for package checking to see if any of these functions are called and give a warning.

There are other functions like file.choose and readline that require interactivity, but these are OK to include in packages, since the end use will be interactive. I don't care too much about these for this use case but feel free to add them to the list.

Which functions have I missed?

Community
  • 1
  • 1
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • 2
    imho `attach` should never be used, but especially not programmatically. And there are of course `View`, `edit` and friends. I'm not sure I agree with `sapply`. – Roland Apr 27 '14 at 10:58
  • 1
    @Roland Agreed with `attach`. My view on `sapply` is that it causes too many obscure bugs due to zero-length inputs. E.g., `sapply(x, length)` returns an integer vector unless `x` has length `0`, in which case you get a list. If you know that the size of the output is constant then use `vapply`, otherwise use `lapply`. – Richie Cotton Apr 27 '14 at 11:02
  • Incidentally, my motivation for this question is that I keep accidentally leaving calls to `browser` in my code, which looks silly if you check it into a repository. – Richie Cotton Apr 27 '14 at 11:07
  • Maybe you should also warn against `cat`. Often `message` should be used instead. – Roland Apr 27 '14 at 11:41
  • sapply is a special case of lapply anyway – rawr Apr 27 '14 at 11:41
  • 3
    This may be a bit drastic. The problem with functions such as `subset` when used in programming is that they use non-standard evaluation of their arguments. This is only a potential problem if those arguments that are evaluated using non-standard evaluation are passed variables. Such functions can still be used safely in functions if the arguments subject to non-standard evaluation are passed constant expressions. For example, `subset(data, id == 1)` would not be a problem in a function. – G. Grothendieck Apr 27 '14 at 12:47
  • @G.Grothendieck OK, `subset` isn't very dangerous. One thing that would be amazing for R would be to have MATLAB's automated code checking and fixing facility. http://bit.ly/1iqehBz This which-functions-should-I-avoid idea is the "what can I build in a day?" equivalent. – Richie Cotton Apr 27 '14 at 13:25
  • This almost goes without saying, but in case any new R users are looking at this answer, avoid "for" loops since they are very slow. – statsRus Apr 27 '14 at 13:54
  • @Roland It just so happens that [I have a case where `attach` is required](https://github.com/klmr/modules/blob/e5e947ce41b899cc458c96fc0c523ce384aa70ab/R/import.r#L81) (but it’s worth noting that this is only called when the user is either in interactive mode, or passing a parameter that’s discouraged anyway). – Konrad Rudolph Apr 27 '14 at 13:58
  • `sapply` is fine when using `simplify=FALSE`. To get the same output as `sapply(letters[1:3], "(", simplify=FALSE)` with `lapply`, you'd have to be redundant `setNames(lapply(letters[1:3], "("), letters[1:3])` – GSee Apr 27 '14 at 14:43

2 Answers2

8

(Feel free to edit.)

The following functions should be handled with care (which does not necessarily mean they are not suitable for programming):

  • Functions whose outputs do not have a consistent output class depending on the inputs: sapply, mapply (by default)

  • Functions whose internal behavior is different depending on the input length: sample, seq

  • Functions that evaluate some of their arguments within environments: $, subset, with, within, transform.

  • Functions that go against normal environment usage: attach, detach, assign, <<-

  • Functions that allow partial matching: $

  • Functions that only make sense in interactive usage: browser, recover, debug, debugonce, edit, fix, menu, select.list

  • Functions that can be a threat (virus) if used with user inputs: source, eval(parse(text=...)), system.

Also, to some extent, every function that generates warnings rather than errors. I recommend using options(warn = 2) to turn all warnings into errors in a programming application. Specific cases can then be allowed via suppressWarnings or try.

flodel
  • 87,577
  • 21
  • 185
  • 223
  • Interestingly, since R 3.1.0 partial matching of `data.frame` column names gives a warning, see [NEWS](http://cran.rstudio.com/src/base/NEWS). – gagolews Apr 27 '14 at 14:28
  • ...but not lists (for example `cars$d` v.s. `as.list(cars)$d`), sigh. – flodel Apr 27 '14 at 14:52
2

This is in answer to the comment after the question by the poster. This function inputs a function and returns the bad functions found with their line number. It can generate false positives but they are only warnings anways so that does not seem too bad. Modify bad to suit.

badLines <- function(func) {
    bad <- c("sapply", "subset", "attach")
    regex <- paste0("\\b", bad, "\\b")
    result <- sort(unlist(sapply(regex, FUN = grep, body(func), simplify = FALSE)))
    setNames(result, gsub("\\b", "", names(result), fixed = TRUE))
}
badLines(badLines)

## sapply1  subset  attach sapply2 
##       2       2       2       4 
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • 2
    Some people might find the `lint` package useful for this kind of code checking. It is flexible enough that you can define your own rules. It has a pretty steep learning curve and poor documentation/support though. – flodel Apr 27 '14 at 13:53
  • @flodel Nice tip about the `lint` package. I just tried `lint(dir("mypackage/R", full.names = TRUE))` and it threw up a load of stuff I hadn't spotted before. – Richie Cotton Apr 28 '14 at 05:56