How to use standard evaluation in dplyr summarise_

Question

I have looked at several places but I just can't figure out how to do this. It looks like it has changed a few times so even more confusing

I want to summarise the NumOfBx by Endoscopist as part of a function. I have the following dataframe

vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ", 
"John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ", 
"Mar Gret ", "Phil Ip ", "Phil Ip "), NumbOfBx = c(2, 4, NA, 
2, 12, 12, NA, NA, NA, 3, NA)), row.names = 100:110, .Names = c("Endoscopist", 
"NumbOfBx"), class = "data.frame")

My function is:

NumBx <- function(x, y, z) {
  x <- data.frame(x)
  x <- x[!is.na(x[,y]), ]
  NumBxPlot <- x %>% group_by_(z) %>% summarise(avg = mean(y, na.rm = T))
}

which I call with:

NumBx(vv,"Endoscopist","NumOfBx)

This gives me the error:

Warning messages:
1: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA

I changed the function to use summarise_

but I get the same thing. Then I realised the need for summarise_ specifically (as opposed to group_by_) needing a standard evaluations and I tried this (from this stackoverflow example)

library(lazyeval)
NumBx <- function(x, y, z) {
  x <- data.frame(x)
  x <- x[!is.na(x[,y]), ]
  NumBxPlot <- x %>% group_by_(z) %>% 
      summarise_(sum_val = interp(~mean(y, na.rm = TRUE), var = as.name(y)))

but I still get the same error of:

Warning messages:
1: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA
2: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA
3: In mean.default(y, na.rm = T) :
  argument is not numeric or logical: returning NA

My intended output is:

Endoscopist   Avg
Jupi Ter       4
John Boy       28
Phil Ip        3

Try using `get(y)` in your summarise function. I tested this and got the same error when trying to use a variable that refers to a column name. The `get()` function solved it for me. Might need to do the same in your `group_by` — Balter, Aug 29 '17 at 15:34

alistaire · Answer 1 · 2017-08-29T16:07:17.770

Using rlang (the replacement for lazyeval), you could do

library(dplyr)

vv <- structure(list(Endoscopist = c("John Boy ", "Jupi Ter ", "Jupi Ter ", "John Boy ", "John Boy ", "John Boy ", "Mar Gret ", "John Boy ", "Mar Gret ", "Phil Ip ", "Phil Ip "), 
                     NumbOfBx = c(2, 4, NA, 2, 12, 12, NA, NA, NA, 3, NA)), 
                row.names = 100:110, .Names = c("Endoscopist", "NumbOfBx"), class = "data.frame")

num_bx <- function(.data, group, variable) {
    group <- enquo(group)
    variable <- enquo(variable)

    .data %>% 
        tidyr::drop_na(!!variable) %>% 
        group_by(!!group) %>% 
        summarise(avg = mean(!!variable))
}

vv %>% num_bx(Endoscopist, NumbOfBx)
#> # A tibble: 3 x 2
#>   Endoscopist   avg
#>         <chr> <dbl>
#> 1   John Boy      7
#> 2   Jupi Ter      4
#> 3    Phil Ip      3

or if you want to keep it as strings instead of unquoted names,

num_bx <- function(.data, group, variable) {
    group <- rlang::sym(group)
    variable <- rlang::sym(variable)

    .data %>% 
        tidyr::drop_na(!!variable) %>% 
        group_by(!!group) %>% 
        summarise(avg = mean(!!variable))
}

vv %>% num_bx("Endoscopist", "NumbOfBx")
#> # A tibble: 3 x 2
#>   Endoscopist   avg
#>         <chr> <dbl>
#> 1   John Boy      7
#> 2   Jupi Ter      4
#> 3    Phil Ip      3

@aosmith Oops, yep, the old package was called lazyeval. "tidy eval" is just a concept, not a package. Fixed above. — alistaire, Aug 29 '17 at 16:08

Artem Sokolov · Answer 2 · 2017-08-29T15:46:15.630

Following the dplyr programming vignette, define your function as follows:

NumBx <- function( x, y, z )
{
    yy <- enquo( y )
    zz <- enquo( z )

    data.frame(x) %>% filter( !is.na(!!yy) ) %>% group_by( !!zz ) %>%
        summarize( avg = mean(!!yy) )
}

You can now call it as:

NumBx( vv, NumbOfBx, Endoscopist )
#   Endoscopist   avg
#         <chr> <dbl>
# 1   John Boy      7
# 2   Jupi Ter      4
# 3    Phil Ip      3

Some notes:

The order of arguments in your call seemed reversed. You want to group by z, but you were passing NumbOfBx as the z argument.
na.rm=TRUE is redundant. You are already filtering out the rows, where the y variable is NA.
The mean of John Boy should be 7, not 28 (the value stated in your intended output).

How to use standard evaluation in dplyr summarise_

2 Answers2