1

I need to construct regular expression dynamically in order to perform an exact match in R using grep(). This is the code I have:

names = c('John Doe', 'John-Doe', 'Doe John', 'Doe-John', 'John', 'Doe')

for(name in names) {
    pattern = paste('(?<![A-z]-)^\\b', name, '\\b$(?!-[A-z])', sep = '')
    index = grep(pattern, names)

    print(index)
}

Desired output:

  • each name must be match exactly to an element within the names vector
  • e.g., John should return only index 5 and nothing else

I tested my regular expression here https://regex101.com/r/uJhJwS/2 and it appears to work fine. However, I get the following error in R:

Error in grep(pattern, names) : 
  invalid regular expression '(?<![A-z]-)John Do$(?!-[A-z])', reason 'Invalid regexp'

What is going wrong?

Toto
  • 89,455
  • 62
  • 89
  • 125
Mihai
  • 2,807
  • 4
  • 28
  • 53
  • 1
    Your regex does not work as you think it does, it is equal to [`^John$`](https://regex101.com/r/uJhJwS/3), and that is what you are looking for: an exact match. – Wiktor Stribiżew Nov 05 '17 at 21:14
  • 1
    You can do this using `sapply` to examine each element of `names` and `which` to look for exact matches, without regex: `sapply(names, function(x) which(names == x))` – neilfws Nov 05 '17 at 21:18
  • @WiktorStribiżew I was using `\bJohn\b` prior to this and it didn't work. That's the only reason why I tried the `lookarounds`. But the `^$` helped me solve it. However, why did you say that the `regex` is equal to ^John$? Thanks. – Mihai Nov 05 '17 at 21:51
  • 1
    @F.Gran The pattern in the regex101 fiddle is `(?<![A-z]-)^\bJohn\b$(?!-[A-z])`. The `(?<![A-z]-)` is a negative lookbehind that is executed at the start of the string (as it is right before `^`) and will always be true (i.e. it will always match) since there is no `[A-z]-` pattern before the start of the string. Similarly, there is no character after the end of the string, and thus `$(?!-[A-z])` is equal to `$` (the `(?!-[A-z])` will always return *true*). Also, note that [`[A-z]` matches more than ASCII letters](https://stackoverflow.com/questions/29771901). – Wiktor Stribiżew Nov 05 '17 at 22:13

0 Answers0