1

I have the same problem as described in the question Using filter_ in dplyr where both field and value are in variables. However, answers there are quite outdated (the base R version still works, but I'm interested in the dplyr version).

search_col <- c("Species", "Sepal.Length")
search_value <- c("setosa", 5.0)

iris %>% filter_(.dots = paste0(search_col, "=='", search_value, "'"))

Could someone please show how to rewrite this using dplyr versions > 1.0.0?

gombi
  • 23
  • 4

1 Answers1

0

1) Firstly note that

search_value <- c("setosa", 5.0)

will coerce 5.0 to character so this is problematic. Instead create a data frame and then use inner_join.

(If the search_value were all strings and not a mix of strings and numerics then in terms of the variables shown in the question we could use the commented out line or we could just use search_cols in place of names(searchDF) below and use search_value in place of searchDF elsewhere.)

# searchDF <- as.data.frame(setNames(as.list(search_value), search_col))
searchDF <- data.frame(Species = "setosa", Sepal.Length = 5.0)
inner_join(iris, searchDF, by = names(searchDF))

giving:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1            5         3.6          1.4         0.2  setosa
2            5         3.4          1.5         0.2  setosa
3            5         3.0          1.6         0.2  setosa
4            5         3.4          1.6         0.4  setosa
5            5         3.2          1.2         0.2  setosa
6            5         3.5          1.3         0.3  setosa
7            5         3.5          1.6         0.6  setosa
8            5         3.3          1.4         0.2  setosa

2) If you must use filter then use cur_data() to refer to the data. The scalar variable can be used directly.

filter(iris, cur_data()[, names(searchDF)[1]] == searchDF[[1]] &
  cur_data()[, names(searchDF)[2]] == searchDF[[2]])

3) For an arbitrary number of conditions use reduce (from purrr) or Reduce (from base R).

library(dplyr)
library(purrr)

myfilter <- function(x, nm) filter(x, cur_data()[, nm] == searchDF[[nm]])
iris %>% reduce(names(searchDF), myfilter, .init = .)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • There's risk in the assumption of floating-point equality (https://stackoverflow.com/q/9508518, https://stackoverflow.com/q/588004, and https://en.wikipedia.org/wiki/IEEE_754). While it seems to work fine here, it is not hard to come up with examples where it will not do as expected. (The biggest problem is that it won't tell you there are problems, it'll just return fewer records than expected.) As an alternative, I suggest `sqldf` or `data.table` for range-based joins. – r2evans Feb 01 '21 at 17:08
  • ... or `fuzzyjoin`. – r2evans Feb 01 '21 at 17:15
  • @r2evans, While true it is a side issue and not entirely relevant to the main intent of the question. – G. Grothendieck Feb 01 '21 at 17:24
  • I'm not saying that every answer on SO has to take into account all possibilities, but *this* is certainly in the realm of what I'll call a common misunderstanding about programming and data-science-y stuff in general. Either way, the point of my comment was not to take away from your solution, more to identify that there is an often-masked risk with floating-point tests of equality. – r2evans Feb 01 '21 at 17:27
  • Thanks, an `inner_join` is also a good solution, just like base R is, Iand probably many other approaches. I was specifically interested in a solution with filter, of similar fashion as in the sample code. In connection with the coercion, good point, but the focus here was not on that, and in my real data I only have strings. Your filter solution works, but it doesn't generalize well if I have more than 2 key-value pairs. I am really interested in how to rewrite the piece of code that I asked in the question. – gombi Feb 01 '21 at 17:45
  • An arbitrary number of conditions was not part of the original question or part of the question in the link; however, the join approach does handle this elegantly. filter really is not the best approach for this but I have added a solution in (3) using reduce from purrr (or use Reduce in base R ). Have also shown in a commented out code line how variables in the question can be used to build searchDF if they had been all strings. – G. Grothendieck Feb 01 '21 at 18:21