Extract rows on the basis of multiple column values

Question

I would like to extract rows from a data frame on the basis of values from two (or more) columns, where the columns contain the same character strings. For example, in the data frame below, I would like to extract all the rows that contain “jam” and “fish”.

This is sort of close, but it focuses on a single column rather than selecting columns on 2 or more columns.

df <- structure(list(num = 1:4, 
                     term_1 = c("jam", "bananna", "fish", 
                                           "carrot"), 
                     term_2 = c("fish", "jam", "apple", "halva"), 
                     term_3 = c("halva", "fish", "carrot", "fish")), 
                row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

df

#>   num  term_1 term_2 term_3
#> 1   1     jam   fish  halva
#> 2   2 bananna    jam   fish
#> 3   3    fish  apple carrot
#> 4   4  carrot  halva   fish

I thought something like this might work, but it doesn't produce what I'm looking for.

df %>% 
  filter(term_1 == c("jam", "fish") | term_2 == c("jam", "fish") | 
           term_3 == c("jam", "fish"))

I’ve tried also tried some other filter() variants--along with if_all()--and str_select with filter(), but I'm missing some detail. Any direction will be appreciated.

^{Created on 2021-07-12 by the reprex package (v2.0.0)}

It would look someting like: `1 jam fish halva` . I have no idea how to format this in the comments. Apologies. — avgoustisw, Jul 12 '21 at 23:31
The expected output would be rows 1 and 2 in the data frame. — avgoustisw, Jul 12 '21 at 23:39

akrun · Accepted Answer · 2021-07-12T23:40:09.880

2

We could use

library(dplyr)
df %>%
    filter(if_any(-num, ~ . %in% "jam") & 
       if_any(-num, ~ . %in% "fish"))
# A tibble: 2 x 4
    num term_1  term_2 term_3
  <int> <chr>   <chr>  <chr> 
1     1 jam     fish   halva 
2     2 bananna jam    fish

edited Jul 12 '21 at 23:40

answered Jul 12 '21 at 23:20

akrun

874,273
37
540
662

Thank you--but this seems to produce the full dataset as a result. – avgoustisw Jul 12 '21 at 23:32
@avgoustisw do you want `df %>% filter(if_any(-num, ~ . %in% "jam") & if_any(-num, ~ . %in% "fish"))` – akrun Jul 12 '21 at 23:38

score 1 · Answer 2 · answered Jul 12 '21 at 23:56

1

You can also use rowSums:

df%>%filter(rowSums(across(-num, ~.%in% c("fish", 'jam')))>=2)

# A tibble: 2 x 4
    num term_1  term_2 term_3
  <int> <chr>   <chr>  <chr> 
1     1 jam     fish   halva 
2     2 bananna jam    fish

answered Jul 12 '21 at 23:56

GuedesBF

8,409
5
19
37

Thanks @GuedesBF--can you explain what the -num and ~. arguments are doing? – avgoustisw Jul 13 '21 at 00:25
1

`-num` is telling `across()` to apply the lambda function (~...) to all columns BUT 'num'. The tilde (~) is shorform for the lambda function. The dot(.) is a placeholder for all the data that comes from the left side of the pipe (`%>%`), or `df`in this case. In this code, the dot can be replaced with `df`. In `purrr` and `dplyr`, `~x+1` (lambda function) is the same as`function(x) x+1`. You should check the `tidyverse`, it is awesome. I strongly recommend you start with dplyr, than purrr. https://dplyr.tidyverse.org/reference/across.html – GuedesBF Jul 13 '21 at 01:16
This will also select rows with values "fish", "fish" or "jam" , "jam" – Ronak Shah Jul 13 '21 at 06:47
Yes, you are right, Ronak. I suspect it is not an issue with the OPs data. But I agree `condition 1 & condition 2` is the safest way to go here. – GuedesBF Jul 13 '21 at 15:10

score 1 · Answer 3 · answered Jul 13 '21 at 06:48

1

You can use rowSums as -

df[rowSums(df[-1] == 'fish') > 0 & rowSums(df[-1] == 'jam') > 0, ]

#   num term_1  term_2 term_3
#  <int> <chr>   <chr>  <chr> 
#1     1 jam     fish   halva 
#2     2 bananna jam    fish

answered Jul 13 '21 at 06:48

Ronak Shah

377,200
20
156
213

Extract rows on the basis of multiple column values

3 Answers3