Subset a data.table by a vector of substrings

Question

Assuming we got this datatable X :

Random <- function(n=1, lenght=6){
  randomString <- c(1:n)
  for (i in 1:n){randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
                                   lenght, replace=TRUE),collapse="")}
  return(randomString)}

X <- data.table(A = rnorm(11000, sd = 0.8),
                B = rnorm(11000, mean = 10, sd = 3),
                C = sample( LETTERS[1:24], 11000, replace=TRUE),
                D = sample( letters[1:24], 11000, replace=TRUE),
                E = round(rnorm(11000,mean=25, sd=3)),
                F = round(runif(n = 11000,min = 1000,max = 25000)),
                G = round(runif(11000,0,200000)),
                H = Random(11000))

I want to subset it by some substring. Here, we will take g, F and d in column H

Here, we got a solution to do this for one substring : How to select R data.table rows based on substring match (a la SQL like)

If we only want g, using data.table package :

X[like(H,pattern = "g")]

But my problem is to replicate this for g, F and d in a single operation.

Vec <- c("g","F","d")
Newtable <- X[like(H,pattern = Vec)]
Warning message:
In grep(pattern, levels(vector)) :
  argument 'pattern' has length > 1 and only the first element will be used

Is there a way to do this whitout creating 3 tables, merging them and remove duplicates ?

I think `like` will take only a single element instead of a vector. Try using `Vectorize` — akrun, Jul 27 '16 at 08:44
@Akrun You're right, and it is my problem. I don't know any function which is able to take a vector for this operation. Btw, thanks for helping. — ARandomUser, Jul 27 '16 at 08:46

akrun · Accepted Answer · 2016-07-27T09:12:48.240

4

We can use grep by pasteing the vector into a single string by collapseing with |.

X[grep(paste(Vec, collapse="|"), H)]

Or we can use the same approach by pasteing the pattern vector collapsed by | (as suggested by @Tensibal)

X[like(H, pattern = paste(Vec, collapse="|"))]

edited Jul 27 '16 at 09:12

answered Jul 27 '16 at 08:56

akrun

874,273
37
540
662

3

`like` is using `grepl` under the hood, I assume the `paste ` method should work on `like`'s `pattern` argument also – Tensibai Jul 27 '16 at 09:10

score 1 · Answer 2 · answered Jul 27 '16 at 09:30

1

I think you can also use this:

NewTable <- X[grepl("g",H) | grepl("F",H)  | grepl("d",H)]

answered Jul 27 '16 at 09:30

advek88

36
4

Subset a data.table by a vector of substrings

2 Answers2