What are non-word boundary in regex (\B), compared to word-boundary?

Question

Mark Byers · Accepted Answer · 2010-12-27T23:56:05.783

103

A word boundary (\b) is a zero width match that can match:

Between a word character (\w) and a non-word character (\W) or
Between a word character and the start or end of the string.

In Javascript the definition of \w is [A-Za-z0-9_] and \W is anything else.

The negated version of \b, written \B, is a zero width match where the above does not hold. Therefore it can match:

Between two word characters.
Between two non-word characters.
Between a non-word character and the start or end of the string.
The empty string.

For example if the string is "Hello, world!" then \b matches in the following places:

 H e l l o ,   w o r l d !
^         ^   ^         ^

And \B matches those places where \b doesn't match:

 H e l l o ,   w o r l d !
  ^ ^ ^ ^   ^   ^ ^ ^ ^   ^

edited Dec 27 '10 at 23:56

answered Dec 27 '10 at 20:28

Mark Byers

811,555
193
1,581
1,452

6

Nice one. In my experience, *explaining* word boundaries is considerably more difficult than *using* them. – Alan Moore Dec 27 '10 at 23:35
5

I have not seen such a lucid explanation of word boundaries before. Great one! – Salil Mar 01 '12 at 23:29
For the `\B` example, the label between `start of string` and `H` is missing. Nice explanation otherwise. – ericyan3000 May 29 '22 at 05:03

score 4 · Answer 2 · answered Jun 02 '15 at 12:29

The basic purpose of non-word-boundary is to created a regex that says:

if we are at the beginning/end of a word char (\w = [a-zA-Z0-9_]) make sure the previous/next character is also a word char,

e.g.: "a\B." ~ "a\w":

"ab", "a4", "a_", ... but not "a ", "a."
if we are at the beginning/end of a non-word char (\W = [^a-zA-Z0-9_]) make sure the previous/next character is also a non-word char,

e.g.: "-\B." ~ "-\W":

"-.", "- ", "--", ... but not "-a", "-1"

For word-boundary it's similar but instead of making sure that the adjacent characters are of the same class (word char/non-word car) they need to differ, hence the name word's boundary.

What are non-word boundary in regex (\B), compared to word-boundary?

2 Answers2

Linked

Related