Subset multiple columns in R with multiple matches

Question

I want to do a similar thing as in this thread: Subset multiple columns in R - more elegant code?

I have data that looks like this:

df=data.frame(x=1:4,Col1=c("A","A","C","B"),Col2=c("A","B","B","A"),Col3=c("A","C","C","A"))
criteria="A"

What I want to do is to subset the data where criteria is meet in at least two columns, that is the string in at least two of the three columns is A. In the case above, the subset would be the first and last row of the data frame df.

score 1 · Accepted Answer · answered Oct 08 '20 at 09:53

1

You can use rowSums :

df[rowSums(df[-1] == criteria) >= 2, ]

#  x Col1 Col2 Col3
#1 1    A    A    A
#4 4    B    A    A

If criteria is of length > 1 you cannot use == directly in which case use sapply with %in%.

df[rowSums(sapply(df[-1], `%in%`, criteria)) >= 2, ]

In dplyr you can use filter with rowwise :

library(dplyr)
df %>%
  rowwise() %>%
  filter(sum(c_across(starts_with('col')) %in% criteria) >= 2)

answered Oct 08 '20 at 09:53

Ronak Shah

377,200
20
156
213

Thanks, the latter solution works fine. One question: If I want to replace `starts_with` with a vector of column names instead, how do I do that? – KGB91 Oct 08 '20 at 10:11
1

You can use `all_of` : `cols <- c('Col1', 'Col2', 'Col3')` and then `df %>% rowwise() %>% filter(sum(c_across(all_of(cols)) %in% criteria) >= 2)`. The first solution should also work if you use `cols` in place of -1. like this : `df[rowSums(df[cols] == criteria) >= 2, ]` – Ronak Shah Oct 08 '20 at 10:16

score 0 · Answer 2 · answered Oct 09 '20 at 02:31

0

We can use subset with apply

subset(df, apply(df[-1] == criteria, 1, sum) >1)
#   x Col1 Col2 Col3
#1 1    A    A    A
#4 4    B    A    A

answered Oct 09 '20 at 02:31

akrun

874,273
37
540
662

Subset multiple columns in R with multiple matches

2 Answers2

Linked