0

subset states

For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

To me it is not clear, how this can lead to errors after reading Advanced R - Non-standard evaluation in subset. Assume I have the following code snippet:

myfun <- function(...) {
  ...
  df <- data.frame(col1 = c("a", "b", NA), col2 = 1:3, col3 = 11:13)
  df_s <- subset(x = df, subset = col1=="a", select = c(col1, col2))
  # In the following I only use df_s in some way
  return(...)
}

To me, this looks save to use in scripts / functions?

Minor issue: Can I include row.names(df_s) <- NULL in subset using ...? I could figure that out...

Christoph
  • 6,841
  • 4
  • 37
  • 89
  • This link was posted the other day, I found it quite useful: https://shipt.tech/https-shipt-tech-advanced-programming-and-non-standard-evaluation-with-dplyr-e043f89deb3d –  Oct 19 '22 at 08:43
  • 1
    Does this answer your question? [Why is \`\[\` better than \`subset\`?](https://stackoverflow.com/questions/9860090/why-is-better-than-subset) – bretauv Oct 19 '22 at 09:03
  • @bretauv Thanks, this adds useful information and I didn't find this question. But I think it is not exactly my questions (see my edit): I mean I really only use the subset and things like `subscramble(mtcars, cyl == 4)` in your reference are even not possible. If I understand the ansewer there correctly, this should be save? – Christoph Oct 19 '22 at 09:30
  • If you use it interactively (meaning that you apply it directly to your dataset), it should be safe. Problems starts arising when you use it in custom functions. – bretauv Oct 19 '22 at 09:34
  • @bretauv I would say, `myfun` is a custom function. But things like passing an argument like `cyl == 4` to `myfun` is not possible. – Christoph Oct 19 '22 at 09:36
  • 1
    To be a bit clearer, problems arrive when you try to pass the subsetting conditions as argument of a custom function. If you define a dataframe and write directly its condition inside a function, it's fine. But don't use subset if you want to create a custom function where users can pass whatever subsetting condition they want. I don't see a problem with the example in your post. – bretauv Oct 19 '22 at 09:37
  • @bretauv Ok, in my case, subset is save. If you feel save, you can merge your comments to an answer... – Christoph Oct 19 '22 at 09:40

1 Answers1

1

This warning in the docs comes from the fact that subset() use non-standard evaluation, which makes it hard to use in custom functions when you want to pass custom subsetting conditions to a dataset. This is already addressed in this answer.

In your case however, you want to use subset() in a custom function but only to apply a known subsetting condition to a known dataset, not to pass custom conditions to subset(). There is no problem to evaluate subset() in this case.

bretauv
  • 7,756
  • 2
  • 20
  • 57