String:
this is something that should work (bs) sdf
RegEx
\b\(bs\)\b
Shows no matches found. Why?
Here it is on Rubular: http://rubular.com/r/jX2Hy6O0XG
String:
this is something that should work (bs) sdf
RegEx
\b\(bs\)\b
Shows no matches found. Why?
Here it is on Rubular: http://rubular.com/r/jX2Hy6O0XG
The reason there is no match is as follows.
A word boundary is defined as
(?: # Cluster start
(?: # -------
^ # Beginning of string anchor
| # or,
(?<= [^a-zA-Z0-9_] ) # Lookbehind assertion for a char that is NOT a word
) # -------
(?= [a-zA-Z0-9_] ) # Lookahead assertion for a char that is IS a word
| # or,
(?<= [a-zA-Z0-9_] ) # Lookbehind assertion for a char that is IS a word
(?: # -------
$ # End of string anchor
| # or,
(?= [^a-zA-Z0-9_] ) # Lookahead assertion for a char that is NOT a word
) # -------
) # Cluster end
So what does \b\(
match ?
If (
is not a word, then \b
expects a word to the left
ie. (?<=[a-zA-Z0-9_])(
. But what comes before it is a space,
therefore, no match.
The same with )\b
ie )(?=[a-zA-Z0-9_])
but again, what comes after is a space.
If you would like a whitespace boundary, you'd use
(?<!\S)(..)(?!\S)
which insures whitespace or bos/eos positions before and after.
or, if you need to insure no word boundary use the negative word boundary
\B(..)\B
The reason there is not match is because there is no word boundary between a space and (
and )
and a space.
See what word boundary matches:
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
If you need to match the (bs)
when not enclosed with word chars, use
(?<!\w)\(bs\)(?!\w)
See a Rubular demo.
Details
(?<!\w)
- a negative lookbehind that matches the location in a string that is not preceded with a word char\(bs\)
- a literal (bs)
string(?!\w)
- a negative lookahead that matches a location that is not immediately followed with a word char.