How to understand regex '\b'?

Question

I am learning the regex.But I can't understand the '\b' , match a word boundary . there have three situation,like this:

Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.

I can't understand the third situation.for example：

var reg = /end\bend/g;
var string = 'wenkend,end,end,endend';
alert( reg.test(string) ) ; //false

The '\b' require a '\w' character at its one side , another not '\w' character at the other side . the string 'end,end' should match the rule, after the first character is string ',' , before the last character is string ',' , so why the result is error .Could you help,Thanks in advance!

============dividing line=============

With your help, I understand it. the 'end,end' match the first 'end' and have a boundary ,but the next character is ',' not 'e',so '/end\bend' is false.

In other words ,the reg '/end\bend/g' or others similar reg aren't exit forever. Thanks again

`\b` doesn't match a character, it matches a spot between characters, a boundary. It's impossible for there to be a word boundary there when the two characters beside the `\b` are both word characters. The regex you're perhaps thinking of is `/end\Wend/g` — 4castle, Oct 28 '16 at 05:41
I know the '\b' capture noting , just a boundary,so the 'end,end' should match,because the ',' just a boundary.@Steve — Anan, Oct 28 '16 at 05:43
@Anan: The comma is not a boundary. There are boundaries immediately before and after the comma, but the comma itself is not a boundary. — user2357112, Oct 28 '16 at 05:44
A comma is not a boundary. The boundary is the null-width substring between `d` and `,`. — Amadan, Oct 28 '16 at 05:44
i think u should refer this link http://www.regular-expressions.info/wordboundaries.html — Jyupin, Oct 28 '16 at 05:44
@Jan: Explaining a word boundary using lookarounds is like potty-training by using OSHA manuals. I don't think it's very useful :P — Amadan, Oct 28 '16 at 05:46
so it's mean the reg /end\bend/g or others similar reg isn't exist forever ?@4castle — Anan, Oct 28 '16 at 05:56
@Anan Right, `/end\bend/` can't match anything, because it would match `endend`, but then there isn't a word boundary between the middle `de`, so the match fails. — 4castle, Oct 28 '16 at 05:58
Ok,I don't really want to use /end\Wend/g, I just want to understand the \b.I understand now.thanks very much :)@4castle — Anan, Oct 28 '16 at 06:04
See http://stackoverflow.com/questions/39875620/python-regex-words-boundary-with-unexpected-results/39876126#39876126 — Wiktor Stribiżew, Oct 28 '16 at 06:04

Max Koretskyi · Accepted Answer · 2016-10-28T06:23:26.520

The \b matches position, not a character. So this regex /end\bend/g says that there must be string end. Then it should be followed by not a word character, which is , and it matches, but the regex engine doesn't move in the string and it stays at ,. So the next character in your regex is e, and e doesn't match ,. So regexp fails. Here is step by step what happens:

-----------------
/end\bend/g,   "end,end"        (match)
   |              |
-----------------

/end\bend/g,   "end,end"        (both regex and string position moved - match)
     |             |
------------------

/end\bend/g,   "end,end"        (the previous match was zero-length, so only regex position moved - not match)
      |            |

Jan · Answer 2 · 2016-10-28T06:09:18.460

With (most) regular expression engines, you can match, capture characters and assert positions within a string.

For the purpose of this example let's assume the string

Rogue One: A Star Wars Story

where you want to match the character o (which is there twice, after R and after t). Now you want to specify the position and want to match os only before lowercase rs.
You write (with a positive lookahead):

o(?=r)

Now generalize the idea of zero-width assertions where you want to look for a word character ahead while making sure there's no word character immediately behind. Herefore you could write:

(?=\w)(?<!\w)

A positive and a negative lookahead, combined. We're almost there :) You only need the same thing around (a word character behind and not a word character ahead) which is:

(?<=\w)(?!\w)

If you combine these two, you'll eventually get (see the | in the middle):

(?:(?=\w)(?<!\w)|(?<=\w)(?!\w))

Which is equivalent to \b (and a lot longer). Coming back to our string, this is true for:

 Rogue One: A Star Wars Story
 # right before R
 # right after e in Rogue
 # right before O of One
 # right after e of One (: is not a word character)
 # and so on...

See a demo on regex101.com.

To conclude, you can think of \b as a zero-width assertion which only ensures a position within the string.

FYI, [tchrist's answer](http://stackoverflow.com/a/4215293/3832970) also describes that "conditional" (actually, contextual) word boundary behavior. — Wiktor Stribiżew, Oct 28 '16 at 06:10

score 0 · Answer 3 · answered Oct 28 '16 at 05:45

0

Try this Expression

/(end)\b|\b(end)/g

answered Oct 28 '16 at 05:45

Harshit Gohil

9

To get a clear idea about how Regular Expression is Working – Harshit Gohil Oct 28 '16 at 05:46
2

The goal here isn't to hand them a fish, it's to teach the person how to fish. This answer is not helpful. – 4castle Oct 28 '16 at 05:50
That's why i posted website link from where i have understand Regular Expression... – Harshit Gohil Oct 28 '16 at 05:54
She can try Everything related to RE over there – Harshit Gohil Oct 28 '16 at 05:55
1

Right, those links are certainly helpful in learning regex, but your answer is not helpful. It's a signpost that says, *"the answer is hidden away over here --->"* – 4castle Oct 28 '16 at 06:01

How to understand regex '\b'?

3 Answers3