14

I've spent some time, but still have to solution. I need regular expression that is able to match a words with signs in it (like c++) in string.

I've used /\bword\b/, for "usual" words, it works OK. But as soon as I try /\bC\+\+\b/ it just does not work. It some how works wrong with a plus signs in it.

I need a regex to detect if input string contains c++ word in it. Input like,

"c++ developer"
"using c++ language" 

etc.

ps. Using C#, .Net Regex.Match function.

Thanks for help!

codaddict
  • 445,704
  • 82
  • 492
  • 529
Alexander Beletsky
  • 19,453
  • 9
  • 63
  • 86

5 Answers5

22

+ is a special character so you need to escape it

\bC\+\+(?!\w)

Note that we can't use \b because + is not a word-character.

kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
6

The problem isn't with the plus character, that you've escaped correctly, but the \b sequence. It indicates a word boundary, which is a point between a word character (alphanumeric) and something else. Plus isn't a word character, so for \b to match, there would need to be a word character directly after the last plus sign.

\bC\+\+\b matches "Test C++Test" but not "Test C++ Test" for example. Try something like \bC\+\+\s if you expect there to be a whitespace after the last plus sign.

Jakob Borg
  • 23,685
  • 6
  • 47
  • 47
3

Plus sign have special meaning so you will have to escape it with \. The same rule applies to these characters: \, *, +, ?, |, {, [, (,), ^, $,., #, and white space

UPDATE: the problem was with \b sequence

Viktor Stískala
  • 1,447
  • 1
  • 13
  • 23
1

If you want to match a c++ between non-word chars (chars other than letters, digits and underscores) you may use

\bc\+\+\B

See the regex demo where \b is a word boundary and \B matches all positions that are not word boundary positions.

C# syntax:

var pattern = @"\bc\+\+\B";

You must remember that \b / \B are context dependent: \b matches between the start/end of string and the adjoining word char or between a word and a non-word chars, while \B matches between the start/end of string and the adjoining non-word char or between two word or two non-word chars.

If you build the pattern dynamically, it is hard to rely on word boundary \b pattern.

Use adaptive dynamic wod boundaries, (?!\B\w) and (?<!\w\B) lookarounds instead, they will always match a word not immediately preceded/followed with a word char if the word starts/ends with a word char:

var pattern = $@"(?!\B\w){Regex.Escape(word)}(?<!\w\B)";

If the word boundaries you want to match are whitespace boundaries (i.e. the match is expected only between whitespaces), use

var pattern = $@"(?<!\S){Regex.Escape(word)}(?!\S)";
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

As the others said, your problem isn't the + sign you've escaped correctly but the \b that is a zero-lenght char that match word boundary that takes place between word \w and non-word \W char.

There is also another mistake in your regex, you want to match char C (uppercase) with c++ (lowercase).To do so you have to change your regex to /\bc\+\+/ or use the i modifier to match case insensitive : /\bc\+\+/i

Toto
  • 89,455
  • 62
  • 89
  • 125