13

I've got two character vectors:

x = {"a", "b", "c", "kt"}
y = {"abs", "kot", "ccf", "okt", "kk", "y"}

I need to use x to remove entries from y so that only the strings that do not contain any of the x's entries remain, like this:

y = {"kot", "kk", "y"}

The code should work for any size of vectors x and y.

So far I've tried to use gsub and grepl but these only work with single strings. I've tried to create a loop to do this but the problem seems more difficult than I thought. And of course, the more sophisticated the solution is, the better, but you can assume that in this case the vectors x and y have up to 200 entries.

Lecromine
  • 178
  • 1
  • 2
  • 10

3 Answers3

24

We can use grep to find out which values in y match the pattern in x and exclude them using !%in%

y[!y %in% grep(paste0(x, collapse = "|"), y, value = T)]

#[1] "kot" "kk"  "y"  

Or even better with grepl as it returns boolean vectors

y[!grepl(paste0(x, collapse = "|"), y)]

A concise version with grep using invert and value parameter

grep(paste0(x, collapse = "|"), y, invert = TRUE, value = TRUE)
#[1] "kot" "kk"  "y"  
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
4

The answer given by @Ronak looks preferable to mine, but one option is to use sapply with grepl to get a matrix of matches against y, for each entry in x, then to roll that up with another call to apply.

> y[!apply(sapply(x, function(q) {grepl(q, y)}), 1, function(x) {sum(as.numeric(x)) > 0})]
[1] "kot" "kk"  "y"  

Here is what I mean by matrix of matches:

> sapply(x, function(q) { grepl(q, y) })
         a     b     c    kt
[1,]  TRUE  TRUE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE
[3,] FALSE FALSE  TRUE FALSE
[4,] FALSE FALSE FALSE  TRUE
[5,] FALSE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE
       ^^^^ each column is a match result for each element of x
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I agree with akrun. This is very helpful but in this case I prefer the grepl-solution for my vectors aren't that long. – Lecromine Nov 30 '16 at 10:49
0

This should also work:

y[Reduce("+", lapply(x, grepl, y, fixed=TRUE))==0]
# [1] "kot" "kk"  "y"  
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63