1

text = 'this is ; an example'

Language is R. I'd like to understand why:

grepl("\\<is\\>",text)

returns TRUE

while

grepl("\\<;\\>",text)

returns FALSE

Note that setting the perl argument to TRUE or FALSE doesn't make any difference. I know that grepl(";",text) works, my question is why doesn't it work anymore when we add word boundaries.

Antoine
  • 1,649
  • 4
  • 23
  • 50
  • I posted an answer since the common [What does this regex mean](https://stackoverflow.com/questions/22937618/) post does not cover TRE library and these patterns in the question. – Wiktor Stribiżew Feb 19 '18 at 11:42

1 Answers1

1

The \< is a leading word boundary and the \> is a trailing word boundary. So, the char after \< must be a word char, and the char before \> should be a word char.

The ; is not a word char. The \<;\> will never match any string as the \<; means match a ; that is preceded with a leading word boundary and ;\> means match a ; that is followed with a trailing word boundary, i.e. requires a ; to be a word char, which is false.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • thanks, this is good to know. Are all punctuation marks not word chars? – Antoine Feb 19 '18 at 11:43
  • All punctuation symbols are not word chars, 100% true. Note that word chars are: letters, digits and `_`. [`"\\"` won't match `Abc` in `_Abc_`](https://ideone.com/6hFmoC) – Wiktor Stribiżew Feb 19 '18 at 11:45