Non standard evaluation is really handy when using dplyr's verbs. But it can be problematic when using those verbs with function arguments. For example let us say that I want to create a function that gives me the number of rows for a given species.
# Load packages and prepare data
library(dplyr)
library(lazyeval)
# I prefer lowercase column names
names(iris) <- tolower(names(iris))
# Number of rows for all species
nrow(iris)
# [1] 150
Example not working
This function doesn't work as expected because species
is interpreted in the context of the iris data frame
instead of being interpreted in the context of the
function argument:
nrowspecies0 <- function(dtf, species){
dtf %>%
filter(species == species) %>%
nrow()
}
nrowspecies0(iris, species = "versicolor")
# [1] 150
3 examples of implementation
To work around non standard evaluation, I usually append the argument with an underscore :
nrowspecies1 <- function(dtf, species_){
dtf %>%
filter(species == species_) %>%
nrow()
}
nrowspecies1(iris, species_ = "versicolor")
# [1] 50
# Because of function name completion the argument
# species works too
nrowspecies1(iris, species = "versicolor")
# [1] 50
It is not completely satisfactory since it changes the name of the function argument to something less user friendly. Or it relies on autocompletion which I'm afraid is not a good practice for programming. To keep a nice argument name, I could do :
nrowspecies2 <- function(dtf, species){
species_ <- species
dtf %>%
filter(species == species_) %>%
nrow()
}
nrowspecies2(iris, species = "versicolor")
# [1] 50
Another way to work around non standard evaluation
based on this answer.
interp()
interprets species
in the context of the
function environment:
nrowspecies3 <- function(dtf, species){
dtf %>%
filter_(interp(~species == with_species,
with_species = species)) %>%
nrow()
}
nrowspecies3(iris, species = "versicolor")
# [1] 50
Considering the 3 function above, what is the preferred - most robust - way to implement this filter function? Are there any other ways?