2

I'm looking for R code that subsets a data frame a for indices that match patterns in another vector k.

For example, consider

x <- c("a", "b", "c")
y <- 1:3
z <- c("foo", "bar", "null")
a <- data.frame(x, y, z)
a
#  x y    z
#1 a 1  foo
#2 b 2  bar
#3 c 3 null

Suppose that I have a list that I want to use to subset a, where k is defined as

k <- c("b", "c")

If I use grepl with apply and sapply I can get the rows that match k, which is what I want.

a[as.logical(apply(sapply(k, grepl, a$x), 1, sum)),]

  x y    z
2 b 2  bar
3 c 3 null

This code however, is REALLY slow when scaled up to large datasets. Is there a faster and simpler way of doing this?

Thanks,

Rafael

EDIT: I tried my best to find the answer to this question on Stack Overflow. Since I could not find it I can assure that the wording used in this post is unique and therefore a contribution to the forum.

1 Answers1

4

a simple way in base R is to use %in%:

a[ a$x %in% k , ]
David Heckmann
  • 2,899
  • 2
  • 20
  • 29