52

I have a data frame and want to filter it in one of two ways, by either column "this" or column "that". I would like to be able to refer to the column name as a variable. How (in dplyr, if that makes a difference) do I refer to a column name by a variable?

library(dplyr)
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
df
#   this that
# 1    1    1
# 2    2    1
# 3    2    2
df %>% filter(this == 1)
#   this that
# 1    1    1

But say I want to use the variable column to hold either "this" or "that", and filter on whatever the value of column is. Both as.symbol and get work in other contexts, but not this:

column <- "this"
df %>% filter(as.symbol(column) == 1)
# [1] this that
# <0 rows> (or 0-length row.names)
df %>% filter(get(column) == 1)
# Error in get("this") : object 'this' not found

How can I turn the value of column into a column name?

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
William Denton
  • 737
  • 1
  • 5
  • 11
  • Why do you want to do this? Trying to parameterize the choice of variable? – smci Nov 30 '14 at 01:38
  • 4
    I have a data set with two types of things in it, and different columns for each. I want to approach things this way because I'm using Shiny to make an interactive visualization and I want to let people choose which type of thing to look at, but then use the same code to extract the data, passing in the column name based on their choice. – William Denton Nov 30 '14 at 18:55
  • Almost posted my question and found this :) – biocyberman Feb 17 '16 at 19:51
  • @Salim B's answer below is one of the best I've seen on this (confusing to me) topic. If you are looking to encode both the LHS and RHS of the `filter` arguments in a variable, this post has another approach: https://www.r-bloggers.com/2020/05/filtering-with-string-statements-in-dplyr/ – Josh Mar 12 '22 at 23:26
  • Here is a discussion of how to use a variable on the RHS when the variable name matches a column name: https://github.com/tidyverse/dplyr/issues/3139 – Josh Mar 13 '22 at 01:47

8 Answers8

88

Using rlang's injection paradigm

From the current dplyr documentation (emphasis by me):

dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.

So, essentially we need to perform two steps to be able to refer to the value "this" of the variable column inside dplyr::filter():

  1. We need to turn the variable column which is of type character into type symbol.

    Using base R this can be achieved by the function as.symbol() which is an alias for as.name(). The former is preferred by the tidyverse developers because it

    follows a more modern terminology (R types instead of S modes).

    Alternatively, the same can be achieved by rlang::sym() from the tidyverse.

  2. We need to inject the symbol from 1) into the dplyr::filter() expression.

    This is done by the so called injection operator !! which is basically syntactic sugar allowing to modify a piece of code before R evaluates it.

    (In earlier versions of dplyr (or the underlying rlang respectively) there used to be situations (incl. yours) where !! would collide with the single !, but this is not an issue anymore since !! gained the right operator precedence.)

Applied to your example:

library(dplyr)
df <- data.frame(this = c(1, 2, 2),
                 that = c(1, 1, 2))
column <- "this"

df %>% filter(!!as.symbol(column) == 1)
#   this that
# 1    1    1

Using alternative solutions

Other ways to refer to the value "this" of the variable column inside dplyr::filter() that don't rely on rlang's injection paradigm include:

Salim B
  • 2,409
  • 21
  • 32
  • 2
    For the latter case you could use some extra round brackets `df %>% filter((!!as.name(column)) == 1)` – Martijn vd Voort Jan 04 '18 at 19:06
  • 1
    Great but "‘UQ()’ and ‘UQS()’ were soft-deprecated in rlang 0.2.0 in order to make the syntax of quasiquotation more consistent." – ggll Feb 20 '20 at 11:47
  • Thanks @ggll , I've updated the answer to account for recent improvements in dplyr/rlang. – Salim B Feb 20 '20 at 15:33
24

I would steer clear of using get() all together. It seems like it would be quite dangerous in this situation, especially if you're programming. You could use either an unevaluated call or a pasted character string, but you'll need to use filter_() instead of filter().

df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"

Option 1 - using an unevaluated call:

You can hard-code y as 1, but here I show it as y to illustrate how you can change the expression values easily.

expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or 
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
#   this that
# 1    1    1

Option 2 - using paste() (and obviously easier):

df %>% filter_(paste(column, "==", 1))
#   this that
# 1    1    1

The main thing about these two options is that we need to use filter_() instead of filter(). In fact, from what I've read, if you're programming with dplyr you should always use the *_() functions.

I used this post as a helpful reference: character string as function argument r, and I'm using dplyr version 0.3.0.2.

Community
  • 1
  • 1
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • 1
    Option 1 is a robust solution as recommended by @hadley in several posts on SO and in the vignette("nse"). However, it looks like it goes against hadley's style of elegant function calls. Can't it be wrapped and presented in a simpler way? – biocyberman Feb 17 '16 at 19:55
  • 1
    why is get() considered dangerous? – pluke May 17 '16 at 21:11
  • 3
    This is a good answer for older versions of `dpylr` but the underscore functions are deprecated so @Salim-B's answer below is better going forward. – Elin Nov 26 '17 at 01:37
15

Here's another solution for the latest dplyr version:

df <- data.frame(this = c(1, 2, 2),
                 that = c(1, 1, 2))
column <- "this"

df %>% filter(.[[column]] == 1)

#  this that
#1    1    1
paul_dg
  • 511
  • 5
  • 16
9

Regarding Richard's solution, just want to add that if you the column is character. You can add shQuote to filter by character values.

For example, you can use

df %>% filter_(paste(column, "==", shQuote("a")))

If you have multiple filters, you can specify collapse = "&" in paste.

df %>$ filter_(paste(c("column1","column2"), "==", shQuote(c("a","b")), collapse = "&"))
StatCC
  • 285
  • 3
  • 11
6

The latest way to do this is to use my.data.frame %>% filter(.data[[myName]] == 1), where myName is an environmental variable that contains the column name.

Phoenix Mu
  • 648
  • 7
  • 12
2

Or using filter_at

library(dplyr)
df %>% 
   filter_at(vars(column), any_vars(. == 1))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Like Salim B explained above but with a minor change:

df %>% filter(1 == !!as.name(column))

i.e. just reverse the condition because !! otherwise behaves like

!!(as.name(column)==1)
carand
  • 1
  • 1
0

You can use the across(all_of()) syntax, it takes a string as argument

column = "this"
df %>% filter(across(all_of(column)) == 1)