0

String:

this is something that should work (bs) sdf

RegEx

\b\(bs\)\b

Shows no matches found. Why?

Here it is on Rubular: http://rubular.com/r/jX2Hy6O0XG

TylerH
  • 20,799
  • 66
  • 75
  • 101
Anthony
  • 33,838
  • 42
  • 169
  • 278
  • And a usual hint: if you need to match the `(bs)` when not enclosed with word chars, use `(?<!\w)\(bs\)(?!\w)`. – Wiktor Stribiżew Aug 31 '17 at 20:37
  • can you please let me know what the significance of `<!` is in your expression? – Anthony Aug 31 '17 at 20:43
  • A negative lookbehind. Fails the match if its pattern is matched immediately to the left of the current position. – Wiktor Stribiżew Aug 31 '17 at 20:46
  • Please re-close as duplicate of [How exactly do Regular Expression word boundaries work in PHP?](https://stackoverflow.com/questions/6531724/how-exactly-do-regular-expression-word-boundaries-work-in-php). – Wiktor Stribiżew Aug 31 '17 at 21:10
  • 2
    Shall we close every question on SO about word boundary's. There are literally 10,000 +. Each one is different. But it is a complex subject, like this questions variation. –  Aug 31 '17 at 21:14
  • @sln Every question that has this very wording and same type of chars near `\b`. They all have one and the same root cause: OP does not understand what a word boundary matches in regex. Thus, the close reason is a post explaining the identical situation (there, `\b` is used before `@`, a non-word char, same as `(`). – Wiktor Stribiżew Aug 31 '17 at 21:15
  • 1
    Close them all or don't close any, can't have it both ways. Hey, thanks for the regex lesson ... –  Aug 31 '17 at 21:17
  • Not sure what your talking about, but you bring up points not me. –  Aug 31 '17 at 21:18
  • What do you mean about points? My answer is a Wiki answer, not giving me any points. – Wiktor Stribiżew Aug 31 '17 at 21:19
  • 1
    You know, the comment you deleted saying open it to just get 25 points. –  Aug 31 '17 at 21:20
  • 1
    If I told you how trivial and easy the c code is to determine boundary's you wouldn't believe it. Yet, least understood... –  Aug 31 '17 at 21:22

2 Answers2

2

The reason there is no match is as follows.

A word boundary is defined as

 (?:                           # Cluster start
      (?:                           # -------
           ^                             # Beginning of string anchor
        |                              # or,
           (?<= [^a-zA-Z0-9_] )          # Lookbehind assertion for a char that is NOT a word
      )                             # -------
      (?= [a-zA-Z0-9_] )            # Lookahead assertion for a char that is IS a word

   |                              # or,

      (?<= [a-zA-Z0-9_] )           # Lookbehind assertion for a char that is IS a word
      (?:                           # -------
           $                             # End of string anchor
        |                              # or,
           (?= [^a-zA-Z0-9_] )           # Lookahead assertion for a char that is NOT a word
      )                             # -------
 )                             # Cluster end

So what does \b\( match ?

If ( is not a word, then \b expects a word to the left

ie. (?<=[a-zA-Z0-9_])(. But what comes before it is a space,
therefore, no match.

The same with )\b ie )(?=[a-zA-Z0-9_]) but again, what comes after is a space.

If you would like a whitespace boundary, you'd use

(?<!\S)(..)(?!\S) which insures whitespace or bos/eos positions before and after.

or, if you need to insure no word boundary use the negative word boundary

\B(..)\B

0

The reason there is not match is because there is no word boundary between a space and ( and ) and a space.

See what word boundary matches:

There are three different positions that qualify as word boundaries:

  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

If you need to match the (bs) when not enclosed with word chars, use

(?<!\w)\(bs\)(?!\w)

See a Rubular demo.

Details

  • (?<!\w) - a negative lookbehind that matches the location in a string that is not preceded with a word char
  • \(bs\) - a literal (bs) string
  • (?!\w) - a negative lookahead that matches a location that is not immediately followed with a word char.
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563