5

I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack

add +

or

add (+)

and the needle

+

the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?

Using

add (plus)

as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
ggutenberg
  • 6,880
  • 8
  • 39
  • 48
  • I changed the formatting of your sample strings to make the spaces in them more obvious; they're essential to understanding why your regex is failing. – Alan Moore Jul 13 '10 at 22:13

3 Answers3

7

\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:

add +

...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • 2
    That makes a lot of sense actually. I take it then that there's no way to achieve what I want to without creating a list of pre-defined characters (such as `\s`, `\(`, `\[`, etc.) to match against before the `+`? – ggutenberg Jul 13 '10 at 22:05
  • 1
    Do you want to match anything that's *not* a word character? That would be `\W` (capital 'w'). Or you can use `\B` to assert that the `+` is not preceded by a word character. – Alan Moore Jul 13 '10 at 22:18
  • 1
    Definitely not *not* a word character, but your explanation of the actual usage of `\b` gave me a kick in the right direction. I'm now doing a JS check on the first character of my regex. If it's a `\\` I don't append the `\b`. If it isn't, I do. Seem to be getting the results I wanted. Thanks. – ggutenberg Jul 13 '10 at 22:24
0

Try changing it to:

/\b\s?+/gi

Edit:

Extend this concept as far as you want. If you want the first + after any word boundary:

/\b[^+]*+/gi
riwalk
  • 14,033
  • 6
  • 51
  • 68
  • That works in the specific example I gave, but doesn't account for word boundaries properly. For example, it doesn't work on 'add (+)' as the haystack. – ggutenberg Jul 13 '10 at 21:35
  • I edited my answer to account for a more general case, but if that's not what you're looking for, then you need to be more specific on what you want. – riwalk Jul 13 '10 at 21:41
  • Not sure how to be more specific. I need to use the word boundary \b to specify that the special character is at the beginning of a word. I updated the question to include 'add (+)' as an example, but there are obviously dozens more where some character (other than whitespace) designates a word boundary. – ggutenberg Jul 13 '10 at 21:44
  • Then I'm sorry, but I can't help you figure out what you need. – riwalk Jul 13 '10 at 21:48
0

Boundaries are very conditional assertions; what they anchor depends on what they touch. See this answer for a detailed explanation, along with what else you can do to deal with it.

Community
  • 1
  • 1
tchrist
  • 78,834
  • 30
  • 123
  • 180