0

I want to do a similar thing as in this thread: Subset multiple columns in R - more elegant code?

I have data that looks like this:

df=data.frame(x=1:4,Col1=c("A","A","C","B"),Col2=c("A","B","B","A"),Col3=c("A","C","C","A"))
criteria="A"

What I want to do is to subset the data where criteria is meet in at least two columns, that is the string in at least two of the three columns is A. In the case above, the subset would be the first and last row of the data frame df.

KGB91
  • 630
  • 2
  • 6
  • 24

2 Answers2

1

You can use rowSums :

df[rowSums(df[-1] == criteria) >= 2, ]

#  x Col1 Col2 Col3
#1 1    A    A    A
#4 4    B    A    A

If criteria is of length > 1 you cannot use == directly in which case use sapply with %in%.

df[rowSums(sapply(df[-1], `%in%`, criteria)) >= 2, ]

In dplyr you can use filter with rowwise :

library(dplyr)
df %>%
  rowwise() %>%
  filter(sum(c_across(starts_with('col')) %in% criteria) >= 2)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks, the latter solution works fine. One question: If I want to replace `starts_with` with a vector of column names instead, how do I do that? – KGB91 Oct 08 '20 at 10:11
  • 1
    You can use `all_of` : `cols <- c('Col1', 'Col2', 'Col3')` and then `df %>% rowwise() %>% filter(sum(c_across(all_of(cols)) %in% criteria) >= 2)`. The first solution should also work if you use `cols` in place of -1. like this : `df[rowSums(df[cols] == criteria) >= 2, ]` – Ronak Shah Oct 08 '20 at 10:16
0

We can use subset with apply

subset(df, apply(df[-1] == criteria, 1, sum) >1)
#   x Col1 Col2 Col3
#1 1    A    A    A
#4 4    B    A    A
akrun
  • 874,273
  • 37
  • 540
  • 662