13

I want to filter a dataframe using a field which is defined in a variable, to select a value that is also in a variable. Say I have

df <- data.frame(V=c(6, 1, 5, 3, 2), Unhappy=c("N", "Y", "Y", "Y", "N"))
fld <- "Unhappy"
sval <- "Y"

The value I want would be df[df$Unhappy == "Y", ].

I've read the nse vignette to try use filter_ but can't quite understand it. I tried

df %>% filter_(.dots = ~ fld == sval)

which returned nothing. I got what I wanted with

df %>% filter_(.dots = ~ Unhappy == sval)

but obviously that defeats the purpose of having a variable to store the field name. Any clues please? Eventually I want to use this where fld is a vector of field names and sval is a vector of filter values for each field in fld.

Ricky
  • 4,616
  • 6
  • 42
  • 72

5 Answers5

16

You can try with interp from lazyeval

 library(lazyeval)
 library(dplyr)
 df %>%
     filter_(interp(~v==sval, v=as.name(fld)))
 #   V Unhappy
 #1 1       Y
 #2 5       Y
 #3 3       Y

For multiple key/value pairs, I found this to be working but I think a better way should be there.

  df1 %>% 
    filter_(interp(~v==sval1[1] & y ==sval1[2], 
           .values=list(v=as.name(fld1[1]), y= as.name(fld1[2]))))
 #  V Unhappy Col2
 #1 1       Y    B
 #2 5       Y    B

For these cases, I find the base R option to be easier. For example, if we are trying to filter the rows based on the 'key' variables in 'fld1' with corresponding values in 'sval1', one option is using Map. We subset the dataset (df1[fld1]) and apply the FUN (==) to each column of df1[f1d1] with corresponding value in 'sval1' and use the & with Reduce to get a logical vector that can be used to filter the rows of 'df1'.

 df1[Reduce(`&`, Map(`==`, df1[fld1],sval1)),]
 #   V Unhappy Col2
 # 2 1       Y    B
  #3 5       Y    B

data

df1 <- cbind(df, Col2= c("A", "B", "B", "C", "A"))
fld1 <- c(fld, 'Col2')
sval1 <- c(sval, 'B')    
akrun
  • 874,273
  • 37
  • 540
  • 662
  • you've [answered this before](http://stackoverflow.com/questions/24569154/use-variable-names-in-functions-of-dplyr) as follows, modified here to fit this post's names: `df %>% filter(get(fld, envir=as.environment(df))==sval)`. I just tried that, and it worked, too. – ulfelder Aug 01 '15 at 09:38
  • 1
    @ulfelder I got a comment in another post by Hadley that it is not better to use `get` . I don't have the link though. – akrun Aug 01 '15 at 09:40
10

Now, with rlang 0.4.0, it introduces a new more intuitive way for this type of use case:

packageVersion("rlang")
# [1] ‘0.4.0’

df <- data.frame(V=c(6, 1, 5, 3, 2), Unhappy=c("N", "Y", "Y", "Y", "N"))
fld <- "Unhappy"
sval <- "Y"

df %>% filter(.data[[fld]]==sval)

#OR
filter_col_val <- function(df, fld, sval) {
  df %>% filter({{fld}}==sval)
}

filter_col_val(df, Unhappy, "Y")

More information can be found at https://www.tidyverse.org/articles/2019/06/rlang-0-4-0/

Previous Answer

With dplyr 0.6.0 and later, this code works:

packageVersion("dplyr")
# [1] ‘0.7.1’

df <- data.frame(V=c(6, 1, 5, 3, 2), Unhappy=c("N", "Y", "Y", "Y", "N"))
fld <- "Unhappy"
sval <- "Y"

df %>% filter(UQ(rlang::sym(fld))==sval)

#OR
df %>% filter((!!rlang::sym(fld))==sval)

#OR
fld <- quo(Unhappy)
sval <- "Y"
df %>% filter(UQ(fld)==sval)

More about the dplyr syntax available at http://dplyr.tidyverse.org/articles/programming.html and the quosure usage in the rlang package https://cran.r-project.org/web/packages/rlang/index.html .

If you find it challenging mastering non-standard evaluation in dplyr 0.6+, Alex Hayes has an excellent writing-up on the topic: https://www.alexpghayes.com/blog/gentle-tidy-eval-with-examples/

Original Answer

With dplyr version 0.5.0 and later, it is possible to use a simpler syntax and gets closer to the syntax @Ricky originally wanted, which I also find more readable than using lazyeval::interp

df %>% filter_(.dots = paste0(fld, "=='", sval, "'"))

#  V Unhappy
#1 1       Y
#2 5       Y
#3 3       Y

#OR
df %>% filter_(.dots = glue::glue("{fld}=='{sval}'"))
LmW.
  • 1,364
  • 9
  • 16
8

Here's an alternative with base R, which is maybe not very elegant, but it might have the benefit of being rather easily understandable:

df[df[colnames(df)==fld]==sval,]
#  V Unhappy
#2 1       Y
#3 5       Y
#4 3       Y
RHertel
  • 23,412
  • 5
  • 38
  • 64
0

Following on from LmW; personally I prefer using a dplyr pipeline where the dots are specified before the pipeline so that it is easier to use programmatically, say in a loop of filters.

dots <-  paste0(fld," == '",sval,"'")
df   %>% filter_(.dots = dots)

LmW's example is correct but the values are hardcoded.

BarneyC
  • 529
  • 4
  • 17
  • I don't see what is hardcoded in `df %>% filter_(.dots = paste0(fld, "=='", sval, "'"))` compared with your code. They are programmiccally equivalent from what I can see. – LmW. May 13 '17 at 20:04
  • Yeah sorry that wasn't meant to sound so *harsh*. Merely that in your CORRECT response the filter is baked directly into the pipe. By defining the dots externally it is a little easier to change those dots (say with a loop) such that the pipe can be applied over a range of data with differing filters each time. – BarneyC May 15 '17 at 16:39
0

So I was trying to do the same thing, and it seems that now dplyr has a builtin functionality to address exactly this.

Check the last example here: https://dplyr.tidyverse.org/reference/filter.html

I'm also pasting it here for simplicity:

# To refer to column names that are stored as strings, use the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars %>%
  filter(
    .data[[vars[[1]]]] > cond[[1]],
    .data[[vars[[2]]]] > cond[[2]]
  )
vagvaf
  • 99
  • 1
  • 7