0

I am trying to use regular expressions to match a vector of string with all units in patterns regardless of the order in R.

mystring = c("sdxuslinafchangdfasd", "fdschangfsdahxufhglin", ",kjujudsyrg")
pattern = c("xu", "chang", "lin")

What I have in mind is:

grepl("xu", mystring) & grepl("chang", mystring) & grepl("lin", mystring)

Which could certainly get what I want. But as the number of strings in the pattern increases, the coding becomes cumbersome. I know I can use the following code if I just want to any one of the pattern, but & seems not to be working in grepl:

grepl("xu|chang|lin", mystring)

My qeustion is: Is there a fast enough and concise way to solve such a problem when the number of strings and the number of units in the pattern are large in R? The priority is speed while the secondary priority is conciseness of the codes. Thanks.

Miao Cai
  • 902
  • 9
  • 25
  • See [Regex to match string containing two names in any order](https://stackoverflow.com/questions/4389644/regex-to-match-string-containing-two-names-in-any-order). – Wiktor Stribiżew Dec 10 '17 at 22:29
  • @ Wiktor Stribiżew Is that an R code? It does not make sense to me in terms of R language. Thanks. – Miao Cai Dec 10 '17 at 22:33
  • `grepl("(?s)^(?=.*xu)(?=.*chang)(?=.*lin)", mystring, perl=TRUE)` – Wiktor Stribiżew Dec 10 '17 at 22:37
  • @WiktorStribiżew Thanks for your help. I am not familiar with perl so I thought I could not use that code in R. Your answer is right and efficient. Thanks. – Miao Cai Dec 11 '17 at 00:08
  • That has got nothing to do with perl, `perl=TRUE` only makes R parse the pattern with a PCRE engine. `perl=TRUE` is a misnomer. – Wiktor Stribiżew Dec 11 '17 at 07:45

0 Answers0