0

I'd like to subset a dataframe to include only rows that have a specific words ("ab" in this example) in many columns. Here's my example:

>df
    ID  RESULT1   RESULT2   RESULT3   RESULT4   ...   RESULT30
1   001   abc        abcd     abcdef     cdef    ...      efs
2   002   cd          efg       hij       kl     ...      fzh
3   003   zabc        efg       jgh       ldc    ...      bcs
4   004   efx         cde       lfs       ab     ...      cd
5   005   ftx         txs       sgs       lfc    ...      edf
6   006   lsd         mde       ald       ldf    ...      klj
7   007   kjl         ell       oip       lab    ...      jkl

Expected output would be something like this (Rows that have "ab" in any columns.

>df.sub
   ID   RESULT1   RESULT2   RESULT3   RESULT4   ...   RESULT30
1  001   abc        abcd     abcdef     cdef    ...      efs
3  003   zabc        efg       jgh       ldc    ...      bcs
4  004   efx         cde       lfs       ab     ...      cd
7  007   kjl         ell       oip       lab    ...      jkl

Can somebody give some solutions? I am new to R. Thanks in advance.

se2se2
  • 1

2 Answers2

0

We loop through the columns of 'df', use grepl to match the pattern "ab", which returns a list of logical vectors, then check the corresponding list elements for any TRUE with Reduce and |, the logical vector can be used for subsetting the rows of the initial dataset.

df[Reduce(`|`, lapply(df[-1], grepl, pattern="ab")),]
#  ID RESULT1 RESULT2 RESULT3 RESULT4 RESULT30
#1  1     abc    abcd  abcdef    cdef      efs
#3  3    zabc     efg     jgh     ldc      bcs
#4  4     efx     cde     lfs      ab       cd
#7  7     kjl     ell     oip     lab      jkl

data

df <- structure(list(ID = 1:7, RESULT1 = c("abc", "cd", "zabc", "efx", 
"ftx", "lsd", "kjl"), RESULT2 = c("abcd", "efg", "efg", "cde", 
"txs", "mde", "ell"), RESULT3 = c("abcdef", "hij", "jgh", "lfs", 
"sgs", "ald", "oip"), RESULT4 = c("cdef", "kl", "ldc", "ab", 
"lfc", "ldf", "lab"), RESULT30 = c("efs", "fzh", "bcs", "cd", 
"edf", "klj", "jkl")), .Names = c("ID", "RESULT1", "RESULT2", 
"RESULT3", "RESULT4", "RESULT30"), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5", "6", "7"))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Here is a solution with base R:

df[rowSums(matrix(grepl("ab", as.matrix(df[-1])), nrow=dim(df[-1])[1])), ]

The result of grepl() is always a vector. Therefore the outer matrix().

data

df <- structure(list(ID = 1:7, RESULT1 = c("abc", "cd", "zabc", "efx", 
"ftx", "lsd", "kjl"), RESULT2 = c("abcd", "efg", "efg", "cde", 
"txs", "mde", "ell"), RESULT3 = c("abcdef", "hij", "jgh", "lfs", 
"sgs", "ald", "oip"), RESULT4 = c("cdef", "kl", "ldc", "ab", 
"lfc", "ldf", "lab"), RESULT30 = c("efs", "fzh", "bcs", "cd", 
"edf", "klj", "jkl")), .Names = c("ID", "RESULT1", "RESULT2", 
"RESULT3", "RESULT4", "RESULT30"), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5", "6", "7"))
Community
  • 1
  • 1
jogo
  • 12,469
  • 11
  • 37
  • 42