Why is "\b" not catching my word boundary in Rails?

Question

I’m using Rails 4.2.7. I want to match an expression and take advantage of the word boundary expression because there may be a space, arbitrary number of spaces, or a dash between my words. But this isn’t working

2.3.0 :006 > /first\bsecond/.match('first second')
 => nil

The manual here — https://ruby-doc.org/core-2.1.1/Regexp.html suggests that “\b” is the right expression to use to catch word boundaries so I’m wondering where I’m going wrong.

See http://stackoverflow.com/questions/39875620/python-regex-words-boundary-with-unexpected-results/39876126#39876126. `\b` is not matching whitespaces and there is one between `first` and `second`. — Wiktor Stribiżew, Oct 12 '16 at 20:02
`\b` is usually used at the beginning or end of an expression only since its utility in other places is questionable. — tadman, Oct 12 '16 at 20:12
@tadman there's a chance that `\b` can be useful in the middle of an expression, but it's so rare that I can't think of a real world use case. Something like this is an efficient use: `/[a-z ]\b[a-z ]/` — Sam, Oct 12 '16 at 20:20
@Sam Yeah, the cases of using it in the middle are typically more like `/test\b|thing/` but in those cases they're at the beginning or end of a sub-expression. — tadman, Oct 12 '16 at 20:28
This appears to be a pure-Ruby question. If so, there should be no Rails tag. Keep in mind that some members may filter out questions with Rails tags. — Cary Swoveland, Oct 12 '16 at 20:38

Sam · Accepted Answer · 2016-10-12T20:09:35.883

2

\b matches a zero-length word boundary, not a space. You're looking for something more like this:

/first\b.\bsecond/.match('first second')

This will match any character (.) in between first and second, as long as there is a word boundary on either side.

However, this is not how word boundaries are usually used (since there is no need to use a zero-length check when you are matching the word boundary itself). \b is essentially looking for a non-word character after a word character; so, instead, you could just look for a non-word character in-between the t in first and s in second:

/first\Wsecond/.match('first second')

This is exactly the same as the first example...but, realistically, you probably just want to match whitespace and can use something like this:

/first\ssecond/.match('first second')

@WiktorStribiżew's third example shows the best use of word boundaries (at the beginning and end). This is because you aren't matching anything before or after, so a zero-length test is helpful. Otherwise, the above examples could match something like first secondary. In the end, I'd use an expression like:

/\bfirst\ssecond\b/.match('first second')

edited Oct 12 '16 at 20:09

answered Oct 12 '16 at 20:03

Sam

20,096
2
45
71

1

Or `/first\ssecond/` or `/first second/` or `/\bfirst second\b/` – Wiktor Stribiżew Oct 12 '16 at 20:04
Thanks @WiktorStribiżew, I was just updating the question to explain why you wouldn't usually use a zero-length match in a situation like this. – Sam Oct 12 '16 at 20:04
Thanks but the answer given fails for the case /first\Wsecond/.match('first second') in whcih there is more than one space in between the words. – Dave Oct 12 '16 at 20:08
@Dave any of these examples can make use of the `+` modifier to say "one or more", i.e. `/first\W+second/`... – Sam Oct 12 '16 at 20:11

Why is "\b" not catching my word boundary in Rails?

1 Answers1