0

I’m using Rails 4.2.7. I want to match an expression and take advantage of the word boundary expression because there may be a space, arbitrary number of spaces, or a dash between my words. But this isn’t working

2.3.0 :006 > /first\bsecond/.match('first second')
 => nil

The manual here — https://ruby-doc.org/core-2.1.1/Regexp.html suggests that “\b” is the right expression to use to catch word boundaries so I’m wondering where I’m going wrong.

Dave
  • 15,639
  • 133
  • 442
  • 830
  • See http://stackoverflow.com/questions/39875620/python-regex-words-boundary-with-unexpected-results/39876126#39876126. `\b` is not matching whitespaces and there is one between `first` and `second`. – Wiktor Stribiżew Oct 12 '16 at 20:02
  • 1
    `\b` is usually used at the beginning or end of an expression only since its utility in other places is questionable. – tadman Oct 12 '16 at 20:12
  • @tadman there's a chance that `\b` can be useful in the middle of an expression, but it's so rare that I can't think of a real world use case. Something like this is an efficient use: `/[a-z ]\b[a-z ]/` – Sam Oct 12 '16 at 20:20
  • @Sam Yeah, the cases of using it in the middle are typically more like `/test\b|thing/` but in those cases they're at the beginning or end of a sub-expression. – tadman Oct 12 '16 at 20:28
  • This appears to be a pure-Ruby question. If so, there should be no Rails tag. Keep in mind that some members may filter out questions with Rails tags. – Cary Swoveland Oct 12 '16 at 20:38

1 Answers1

2

\b matches a zero-length word boundary, not a space. You're looking for something more like this:

/first\b.\bsecond/.match('first second')

This will match any character (.) in between first and second, as long as there is a word boundary on either side.


However, this is not how word boundaries are usually used (since there is no need to use a zero-length check when you are matching the word boundary itself). \b is essentially looking for a non-word character after a word character; so, instead, you could just look for a non-word character in-between the t in first and s in second:

/first\Wsecond/.match('first second')

This is exactly the same as the first example...but, realistically, you probably just want to match whitespace and can use something like this:

/first\ssecond/.match('first second')

@WiktorStribiżew's third example shows the best use of word boundaries (at the beginning and end). This is because you aren't matching anything before or after, so a zero-length test is helpful. Otherwise, the above examples could match something like first secondary. In the end, I'd use an expression like:

/\bfirst\ssecond\b/.match('first second')
Sam
  • 20,096
  • 2
  • 45
  • 71
  • 1
    Or `/first\ssecond/` or `/first second/` or `/\bfirst second\b/` – Wiktor Stribiżew Oct 12 '16 at 20:04
  • Thanks @WiktorStribiżew, I was just updating the question to explain why you wouldn't usually use a zero-length match in a situation like this. – Sam Oct 12 '16 at 20:04
  • Thanks but the answer given fails for the case /first\Wsecond/.match('first second') in whcih there is more than one space in between the words. – Dave Oct 12 '16 at 20:08
  • @Dave any of these examples can make use of the `+` modifier to say "one or more", i.e. `/first\W+second/`... – Sam Oct 12 '16 at 20:11