Difference between \b and \B in regex

Question

I am reading a book on regular expression and I came across this example for \b:

Using regex - \bcat\b will match the word cat but not the cat in scattered.

For \B the author uses the following example:

Using regex \B-\B matches - between the word color - coded. Using \b-\b on the other hand matches the - in nine-digit and pass-key.

How come in the first example we use \b to separate cat and in the second use \B to separate -? Using \b in the second example does the opposite of what it did earlier.

Please explain the difference to me.

EDIT: Also, can anyone please explain with a new example?

@YohanesAI The book was Sams Teach Yourself Regular Expressions in 10 Minutes by Ben Forta — stirredo, Jul 23 '21 at 14:30

score 137 · Accepted Answer · answered Jul 12 '11 at 12:41

137

The confusion stems from your thinking \b matches spaces (probably because "b" suggests "blank").

\b matches the empty string at the beginning or end of a word. \B matches the empty string not at the beginning or end of a word. The key here is that "-" is not a part of a word. So <left>-<right> matches \b-\b because there are word boundaries on either side of the -. On the other hand for <left> - <right> (note the spaces), there are not word boundaries on either side of the dash. The word boundaries are one space further left and right.

On the other hand, when searching for \bcat\b word boundaries behave more intuitively, and it matches " cat " as expected.

answered Jul 12 '11 at 12:41

andrewdski

5,255
2
20
20

2

Yes, I was indeed confusing \b with a blank space. However, I still feel a little confused. Can I ask you for one more example? – stirredo Jul 12 '11 at 13:06
3

The key is that `-` is not considered part of a word. Similarly, `!` is not a part of a word. So again `\b!\b` matches "uunet!iamold", but not "Wow! You are." You can try this stuff out at http://regexpal.com. – andrewdski Jul 13 '11 at 05:35
@andrewdski In my case \b catches also for punctuations... I tried with \b[A-Z0-9]+\b on 1987894, 3219800; 234567, 345261. and it works fine, I obtain only numbers – gunzapper Apr 09 '14 at 14:15
2

Just adding that a work in regex is composed of letters (a– z and A– Z), digits, and the “_” [underscore]). Everything else is non word. – Maralc Aug 25 '15 at 01:31
1

Could someone elaborate this line `\B matches the empty string not at the beginning or end of a word` – Arun Gowda May 21 '19 at 12:13
1

I'd stress **boundaries** a bit more, maybe typographically, even better to move the right hint (*b* means *boundary*) to the top of the answer. – Wolf Jan 09 '20 at 09:47
1

\B is the negated version of \b – vomi Feb 27 '20 at 14:24

score 98 · Answer 2 · answered Jul 12 '11 at 12:30

98

\b is a zero-width word boundary. Specifically:

Example: .\b matches c in abc

\B is a zero-width non-word boundary. Specifically:

Example: \B.\B matches b in abc

See regular-expressions.info for more great regex info

answered Jul 12 '11 at 12:30

Bohemian

412,405
93
575
722

15

+1 because zero-width is an important part of the definition. If it weren't zero-width, then it would also grab those word/non-word characters in the matching part of the pattern. – Ben Hocking Jul 12 '11 at 12:33
5

In other words, \B matches the spot between \W and \W or between \w and \w, but not between \W and \w. – Jul 12 '11 at 12:38
1

This answer has been added to the [Stack Overflow Regular Expression FAQ](http://stackoverflow.com/a/22944075/2736496), under "Anchors". – aliteralmind Apr 10 '14 at 00:21
Following up on this: @Bohemian why is it that in JS `"abc def".match(/\b./)` returns `['a']` as opposed to what [http://www.regular-expressions.info/refwordboundaries.html] (Your source: regex.info) says `['a', ' ', 'd']`. – steviesh Jun 11 '16 at 15:23
1

@stephenhuh `string.match()` returns only the first match, unless you add the *global* flag `g`: `"abc def".match(/\b./g)` returns `['a', ' ', 'd']` – Bohemian Jun 11 '16 at 20:34
1

I think this is the better answer. I should also be the accepted one because it resolves the confusion. Much more to learn here. – Wolf Jan 09 '20 at 09:49

score 66 · Answer 3 · answered Apr 11 '16 at 11:21

With a different example:

Consider this is the string and pattern to be searched for is 'cat':

text = "catmania thiscat thiscatmaina";

Now definitions,

'\b' finds/matches the pattern at the beginning or end of each word.

'\B' does not find/match the pattern at the beginning or end of each word.

Different Cases:

Case 1: At the beginning of each word

result = text.replace(/\bcat/g, "ct");

Now, result is "ctmania thiscat thiscatmaina"

Case 2: At the end of each word

result = text.replace(/cat\b/g, "ct");

Now, result is "catmania thisct thiscatmaina"

Case 3: Not in the beginning

result = text.replace(/\Bcat/g, "ct");

Now, result is "catmania thisct thisctmaina"

Case 4: Not in the end

result = text.replace(/cat\B/g, "ct");

Now, result is "ctmania thiscat thisctmaina"

Case 5: Neither beginning nor end

result = text.replace(/\Bcat\B/g, "ct");

Now, result is "catmania thiscat thisctmaina"

Hope this helps :)

Correct me if i'm wrong please, but, when using \bcat\b, if our string would have been, for example: "catcat is my cat" => the first word (catcat) would have applied to this condition. no?. — Kosem, Jun 17 '19 at 08:05
@Kosem this is a good question and made me study `\b` better. What I understood after is that the reason why the first word doesn't match is that with `\bcat\b` you are saying that cat must be surrounded by word boundaries. `catcat` starts with a word boundary, the first inner `cat` is found [we are still respecting the regex] but then the tail `\b` is imposing that after the first `t` in `catcat` there should be another word boundary, which is NOT the case because the second `c` in `catcat` is another word character [= word goes on]. Similar reason for skipping the second `cat` of `catcat` — Antonino, Oct 20 '20 at 07:33

score 11 · Answer 4 · answered Jul 12 '11 at 12:31

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a "word boundary". This match is zero-length.

There are three different positions that qualify as word boundaries:

Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

Source: http://www.regular-expressions.info/wordboundaries.html

score 6 · Answer 5 · answered Oct 06 '19 at 20:40

Source © Copyright RexEgg.com

Word Boundary: \b*

The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).

The regex \bcat\b would, therefore, match cat in a black cat, but it wouldn't match it in catatonic, tomcat or certificate. Removing one of the boundaries, \bcat would match cat in catfish, and cat\b would match cat in tomcat, but not vice-versa. Both, of course, would match cat on its own.

Not-a-word-boundary: \B

\B matches all positions where \b doesn't match. Therefore, it matches:

✽ When neither side is a word character, for instance at any position in the string $=(@-%++) (including the beginning and end of the string)

✽ When both sides are a word character, for instance between the H and the i in Hi!

This may not seem very useful, but sometimes \B is just what you want. For instance,

✽ \Bcat\B will find cat fully surrounded by word characters, as in certificate, but neither on its own nor at the beginning or end of words.

✽ cat\B will find cat both in certificate and catfish, but neither in tomcat nor on its own.

✽ \Bcat will find cat both in certificate and tomcat, but neither in catfish nor on its own.

✽ \Bcat|cat\B will find cat in embedded situation, e.g. in certificate, catfish or tomcat, but not on its own.

score 4 · Answer 6 · answered May 31 '20 at 13:37

4

\b is used as word boundary

word = "categorical cat"

Find all "cat" in the above word

without \b

re.findall(r'cat',word)
['cat', 'cat']

with \b

re.findall(r'\bcat\b',word)
['cat']

answered May 31 '20 at 13:37

Kavyajeet Bora

612
1
7
15

score 3 · Answer 7 · edited Mar 25 '17 at 13:48

Let take a string like :

Note: Underscore ( _ ) is not considered a special character in this case.

/\bX\b/g Should begin and end with a special character or white Space

/\bX/g Should begin with a special character or white Space

/X\b/g Should end with a special character or white Space

/\BX\B/g
Should not begin and not end with a special character or white Space

/\BX/g Should not begin with a special character or white Space

/X\B/g Should not end with a special character or white Space

/\bX\B/g Should begin and not end with a special character or white Space

/\BX\b/g Should not begin and should end with a special character or white Space

score 3 · Answer 8 · edited May 23 '17 at 11:47

3

\b matches a word-boundary. \B matches non-word-boundaries, and is equivalent to ~~[^\b]~~(?!\b) _{^{(thanks to @Alan Moore for the correction!)}}. Both are zero-width.

See http://www.regular-expressions.info/wordboundaries.html for details. The site is extremely useful for many basic regex questions.

edited May 23 '17 at 11:47

Community

1
1

answered Jul 12 '11 at 12:31

Matt Ball

354,903
100
647
710

4

`\B` is **not** equivalent to `[^\b]`. A character class (`[...]` or `[^...]`) consumes exactly one character, while zero-width assertions like `\b` and `\B` don't consume anything. If you put `\b` in a character class, it takes a completely different meaning: `[\b]` matches a backspace, and `[^\b]` matches any character *except* a backspace. `\B` is really equivalent to `(?!\b)`. – Alan Moore Jul 12 '11 at 18:01
@Alan thanks, you're completely correct - I was not awake this morning when I wrote that. Fixed. – Matt Ball Jul 12 '11 at 18:17
4

...but why anybody would want to match a backspace is beyond me. :D – Alan Moore Jul 12 '11 at 21:17

score 2 · Answer 9 · answered Sep 26 '22 at 17:01

As mentioned in https://www.regular-expressions.info/wordboundaries.html :

To have a better understanding of \b, I'd like to consider the string by putting the word boundaries on it using arrows.

Click this link for the array visualization of the string - 'THE CAT SCATTERED'.

Click this link for the array visualization of the string - 'THE NINE-DIGIT COLOR - CODED PASS-KEY'

In the string THE CAT SCATTERED

The word boundary at index 0 is assigned by following the condition 1 mentioned above.
The word boundary at index 16 is assigned by following the condition 2.
The word boundaries at indices 2, 4, 6 and 8 are assigned by following the condition 3.

In the string THE NINE-DIGIT COLOR - CODED PASS-KEY

The word boundary at index 0 is assigned by following condition 1.
All the remaining word boundaires are assigned by following the condition 3. Note here that since the string ends with a '.' character (which is not a word character \w), the condition 2 is not applied.

A similar array visualization can be done for non-word boundary \B using following the condition:

(Credits: Check @Ganesh M S's answer for the same quesiton)

score 0 · Answer 10 · answered Jul 12 '11 at 12:30

\B is not \b e.g. negative \b

pass-key here is no word boundary beside - so it matches \B in your first example there are word boundary beside cat so it matches \b

similar rules apply for others too. \W is negative of \w \UPPER CASE is negative of \LOWER CASE

Difference between \b and \B in regex

10 Answers10

Linked

Related