How to return only the rows from a data frame that any of the column cells contain some specific strings?

Question

I have a data frame like below but with 384 columns:

id  col1    col2       col3     col4    col5    .....     col385


1       B45-P   Y   X       RH_B17   S-B45   IU_B34

'
             IU_B34 Y   Y   Y      X

.   S-B45                   RH_B17         X

'
            RH_B17                 X
'
    X   S-B45       X   x   X   IU_B34     X


155 Y   RH_B17              Y       X

I want to filter the above data frame and just keep the rows that in any of their columns they contain (B45 or B17 or B34).

I attached the image of the data frame

Hi and welcome on stackoverflow. Please read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1 to learn how to ask a good question and edit your question accordingly. As of now it is very unclear what you are asking. — Cettt, Apr 23 '19 at 20:30

score 0 · Answer 1 · edited Apr 23 '19 at 23:15

0

clunky but works for me:

library(tidyr)
library(stringr)
df[str_detect(string = unite(df, col = "all", sep = " ")$all, pattern = "B45|B17|B34"),]

edited Apr 23 '19 at 23:15

Paul Wildenhain

1,321
10
12

answered Apr 23 '19 at 20:55

Michael

63
5

`tidyr::unite()` takes a data frame `(data = )` and combines all of the columns specified (or all of them if you don't specify any) into a single column with the name given in the `col = ` argument. `stringr::str_detect()` takes that single column as a vector and returns a logical vector of which rows contain the string(s). to use multiple strings in the `pattern = ` argument of `str_detect()`, separate them with `|`. – Michael Apr 23 '19 at 21:01

How to return only the rows from a data frame that any of the column cells contain some specific strings?

1 Answers1