17

I'm trying to put together a function that creates a subset from my original data frame, and then uses dplyr's SELECT and MUTATE to give me the number of large/small entries, based on the sum of the width and length of sepals/petals.

filter <- function (spp, LENGTH, WIDTH) {
  d <- subset (iris, subset=iris$Species == spp) # This part seems to work just fine
  large <- d %>%                       
    select (LENGTH, WIDTH) %>%   # This is where the problem arises.
    mutate (sum = LENGTH + WIDTH) 
  big_samples <- which(large$sum > 4)
 return (length(big_samples)) 
}

Basically, I want the function to return the number of large flowers. However, when I run the function I get the following error -

filter("virginica", "Sepal.Length", "Sepal.Width")

 Error: All select() inputs must resolve to integer column positions.
The following do not:
*  LENGTH
*  WIDTH 

What am I doing wrong?

Tung
  • 26,371
  • 7
  • 91
  • 115
ari8888
  • 309
  • 1
  • 2
  • 10
  • 3
    `dplyr` functions use non-standard evaluation. That is why you do not have to quote your variable names when you do something like `select(mtcars, mpg)`, and why `select(mtcars, "mpg")` doesn't work. When you use `dplyr` in functions, you will likely want to use "standard evaluation". See `vignette("nse")` for more details. – ialm Dec 09 '15 at 19:08
  • but why the function? – MLavoie Dec 09 '15 at 19:10
  • 2
    A quick and dirty solution is to change `select(LENGTH, WIDTH) %>%` to `select(get(LENGTH), get(WIDTH)) %>%`. However, you should really be using `select_()` and `mutate_()` in your functions. – ialm Dec 09 '15 at 19:10

3 Answers3

22

You are running into NSE/SE problems, see the vignette for more info.

Briefly, dplyr uses a non standard evaluation (NSE) of names, and passing names of columns into functions breaks it, without using the standard evaluation (SE) version.

The SE versions of the dplyr functions end in _. You can see that select_ works nicely with your original arguments.

However, things get more complicated when using functions. We can use lazyeval::interp to convert most function arguments into column names, see the conversion of the mutate to mutate_ call in your function below and more generally, the help: ?lazyeval::interp

Try:

filter <- function (spp, LENGTH, WIDTH) {
    d <- subset (iris, subset=iris$Species == spp) 
    large <- d %>%                       
        select_(LENGTH, WIDTH) %>%  
        mutate_(sum = lazyeval::interp(~X + Y, X = as.name(LENGTH), Y = as.name(WIDTH))) 
    big_samples <- which(large$sum > 4)
    return (length(big_samples)) 
}
Mikko
  • 7,530
  • 8
  • 55
  • 92
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • This is a great solution to the problem. Just out of curiosity, would anyone have an easier or simpler function that could be used achieve the same outcome? – ari8888 Dec 10 '15 at 20:33
  • 1
    here's what I would do: `myfun <- function(species, col1, col2){ sum(iris$Species ==species & (iris[[col1]]+iris[[col2]]) > 4) }` – jeremycg Dec 10 '15 at 21:51
13

UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.

See http://dplyr.tidyverse.org/articles/programming.html for more details.

filter_big <- function(spp, LENGTH, WIDTH) {
  LENGTH <- enquo(LENGTH)                    # Create quosure
  WIDTH  <- enquo(WIDTH)                     # Create quosure

  iris %>% 
    filter(Species == spp) %>% 
    select(!!LENGTH, !!WIDTH) %>%            # Use !! to unquote the quosure
    mutate(sum = (!!LENGTH) + (!!WIDTH)) %>% # Use !! to unquote the quosure
    filter(sum > 4) %>% 
    nrow()
}

filter_big("virginica", Sepal.Length, Sepal.Width)

> filter_big("virginica", Sepal.Length, Sepal.Width)
[1] 50
Brad Cannell
  • 3,020
  • 2
  • 23
  • 39
5

If quosure and quasiquotation are too much for you, use either .data[[ ]] or rlang {{ }} (curly curly) instead. See Hadley Wickham's 5min video on tidy evaluation and (maybe) Tidy evaluation section in Hadley's Advanced R book for more information.

library(rlang)
library(dplyr)

filter_data <- function(df, spp, LENGTH, WIDTH) {
  res <- df %>% 
    filter(Species == spp) %>% 
    select(.data[[LENGTH]], .data[[WIDTH]]) %>%        
    mutate(sum = .data[[LENGTH]] + .data[[WIDTH]]) %>% 
    filter(sum > 4) %>% 
    nrow()
  return(res)
}

filter_data(iris, "virginica", "Sepal.Length", "Sepal.Width")
#> [1] 50


filter_rlang <- function(df, spp, LENGTH, WIDTH) {
  res <- df %>% 
    filter(Species == spp) %>% 
    select({{LENGTH}}, {{WIDTH}}) %>%        
    mutate(sum = {{LENGTH}} + {{WIDTH}}) %>% 
    filter(sum > 4) %>% 
    nrow()
  return(res)
}

filter_rlang(iris, "virginica", Sepal.Length, Sepal.Width)
#> [1] 50

Created on 2019-11-10 by the reprex package (v0.3.0)

Tung
  • 26,371
  • 7
  • 91
  • 115