2

It is suggested, that function calls inside R-package functions should preferably use standard evaluation (see here), especially to avoid utils::globalVariables.

If I'm using non-standard evaluation with the dplyr package, what would be the "translation" into standard evaluation for the following code-snippet - especially for the table-command?

grp and dep are numeric values of the data frame mydf, while x is a factor.

Non-standard evaluation:

pvals <- mydf %>%
  dplyr::group_by(grp) %>%
  dplyr::summarise(N = n(),
    p = suppressWarnings(stats::chisq.test(table(x, dep))$p.value))

Standard evaluation?

pvals <- mydf %>%
  dplyr::group_by_("grp") %>%
  dplyr::summarise_(N = n(),
    p = suppressWarnings(stats::chisq.test(table("x", "dep"))$p.value))

And, what about function calls with ggplot? Does ggplot have standard-evaluation support?

Edit: Added reproducible example.

library(dplyr)
data(ChickWeight)
ChickWeight %>%
  dplyr::group_by(Diet) %>%
  dplyr::summarise(N = n(),
  p = suppressWarnings(stats::chisq.test(table(weight, Time))$p.value))
Romain
  • 1,931
  • 1
  • 13
  • 24
Daniel
  • 7,252
  • 6
  • 26
  • 38
  • 1
    Why don't you use one of the built-in data sets for your example so that it's easily reproducible? – talat Feb 28 '16 at 13:24

2 Answers2

1

You can try to never hard code the variable names within your function, and use rlang quasiquotation instead.

From your example, within a function context, I would write :

#' Chisq table
#' @importFrom rlang enquo !!
#' @importFrom magrittr %>%
#'
#' @param data Dataset
#' @param x,y,group bare variable names
#' @export
chisq_table <- function(data, x, y, group){
  x <- enquo(x)
  y <- enquo(y)
  group <- enquo(group)

  data %>%
    dplyr::group_by(!!group) %>%
    dplyr::summarise(
      N = dplyr::n(),
      p = suppressWarnings(stats::chisq.test(table(!!x, !!y))$p.value)
    )
}

data(ChickWeight)
chisq_table(data = ChickWeight, x = weight, y = Time, group = Diet)

## # A tibble: 4 x 3
##   Diet      N        p
##  <fct> <int>    <dbl>
## 1 1       220 4.42e-16
## 2 2       120 3.76e- 7
## 3 3       120 4.74e- 6
## 4 4       118 1.33e- 5

This does not trigger a note when checking the package, and makes maintaining your functions easier if the column names in your datasets happen to change.

Romain
  • 1,931
  • 1
  • 13
  • 24
0

If you wan't to use dplyr I would just ignore the false positive of the codetools::checkUsagePackage().

Florian
  • 597
  • 3
  • 9
  • I also would like to know, if using standard evaluation is the better habit, when to use it? I have some function calls to ggplot, where I can't use standard evaluation and have to use `globalVariables` to make my package pass the R CMD check. – Daniel Feb 28 '16 at 14:17
  • I would just use R CMD check if you get flase positive you can fix it with http://www.inside-r.org/r-doc/utils/globalVariables but I never encountered this issue. – Florian Feb 28 '16 at 14:34
  • with regards to standard vs non standard, `x <- data.frame(a=1:4, b=5:8)` if you want to order it by both columns you can do `order(x$a, x$b)` or `with(x, order(a, b))` it will work both, the secound one can be in many cases mor readable and less typing. But if R CMD checks if all your variables are defined you migth doesn't check that you are within a with and gives a false positive. – Florian Feb 28 '16 at 14:41
  • So I would say the better habit is, whatever is more readable for you! And the globalVariables function is just a way to deal with false positive, which seems ok for me. – Florian Feb 28 '16 at 14:46