31

I'd like to pull some data from a sql server with a dynamic filter. I'm using the great R package dplyr in the following way:

#Create the filter
filter_criteria = ~ column1 %in% some_vector
#Connect to the database
connection <- src_mysql(dbname <- "mydbname", 
             user <- "myusername", 
             password <- "mypwd", 
             host <- "myhost") 
#Get data
data <- connection %>%
 tbl("mytable") %>% #Specify which table
 filter_(.dots = filter_criteria) %>% #non standard evaluation filter
 collect() #Pull data

This piece of code works fine but now I'd like to loop it somehow on all the columns of my table, thus I'd like to write the filter as:

#Dynamic filter
i <- 2 #With a loop on this i for instance
which_column <- paste0("column",i)
filter_criteria <- ~ which_column %in% some_vector

And then reapply the first code with the updated filter.

Unfortunately this approach doesn't give the expected results. In fact it does not give any error but doesn't even pull any result into R. In particular, I looked a bit into the SQL query generated by the two pieces of code and there is one important difference.

While the first, working, code generates a query of the form:

SELECT ... FROM ... WHERE 
`column1` IN ....

(` sign in the column name), the second one generates a query of the form:

SELECT ... FROM ... WHERE 
'column1' IN ....

(' sign in the column name)

Does anyone have any suggestion on how to formulate the filtering condition to make it work?

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
Lorenzo Rossi
  • 1,481
  • 1
  • 9
  • 16

3 Answers3

43

It's not really related to SQL. This example in R does not work either:

df <- data.frame(
     v1 = sample(5, 10, replace = TRUE),
     v2 = sample(5,10, replace = TRUE)
)
df %>% filter_(~ "v1" == 1)

It does not work because you need to pass to filter_ the expression ~ v1 == 1 — not the expression ~ "v1" == 1.

To solve the problem, simply use the quoting operator quo and the dequoting operator !!

library(dplyr)
which_column = quot(v1)
df %>% filter(!!which_column == 1)
Matthew
  • 2,628
  • 1
  • 20
  • 35
  • Hey Matthew, thank you very much for your answer. The first method you proposed solved my problem, thank you!! I did read the nse vignette and went very close to this solution myself, but I forgot the as.name/as.symbol call. The second method you suggest is not feasible in the code I'm writing since there is not dataframe in R yet, the filter must act as WHERE clasue in the sql query generated and then pull the data. That's the reason why I included the mentions of SQL in the title and in the text of the question. Thank you again for your answer! – Lorenzo Rossi Oct 23 '14 at 08:41
  • Could you please explain what can be done if 1 is also yet another variable? something like filter_criteria <- interp(~ which_column == val123, which_column = as.name("v1"), val123=???) Here val123 is an integer – qwerty123 Jun 04 '16 at 20:51
  • filter_criteria <- interp(~ var1 == var2, var1 = as.name("v1"), var2 = as.name("v2")) – Matthew Jun 06 '16 at 20:02
7

An alternative solution, with dplyr version 0.5.0 (probably implemented earlier than that), it is possible to pass a composed string as the .dots argument, which I find more readable than the lazyeval::interp solution:

df <- data.frame(
     v1 = sample(5, 10, replace = TRUE),
     v2 = sample(5,10, replace = TRUE)
)

which_col <- "v1"
which_val <- 1
df %>% filter_(.dots= paste0(which_col, "== ", which_val))

  v1 v2
1  1  1
2  1  2
3  1  4

UPDATE for dplyr 0.6 and later:

packageVersion("dplyr")
# [1] ‘0.5.0.9004’

df %>% filter(UQ(rlang::sym(which_col))==which_val)
#OR
df %>% filter((!!rlang::sym(which_col))==which_val)

(Similar to @Matthew 's response for dplyr 0.6, but I assume that which_col is a string variable.)

2nd UPDATE: Edwin Thoen created a nice cheatsheet for tidy evaluation: https://edwinth.github.io/blog/dplyr-recipes/

LmW.
  • 1,364
  • 9
  • 16
5

Here's a slightly less verbose solution and one which uses the typical behavior of the extract function, '[' in selecting a column by character value rather than converting it to a language element:

df %>% filter(., '['(., which_column)==1 )

set.seed(123)
df <- data.frame(
      v1 = sample(5, 10, replace = TRUE),
      v2 = sample(5,10, replace = TRUE)
 )
which_column <- "v1"
df %>% filter(., '['(., which_column)==1)
#  v1 v2
#1  1  5
IRTFM
  • 258,963
  • 21
  • 364
  • 487