3

I've got a strange behaviour with the regex \bC#?\b

string s1 = Regex.Replace("Bla Ca bla", @"\bCa?\b", "[$0]"); // Bla [Ca] bla (as expected)
string s2 = Regex.Replace("Bla C# bla", @"\bC#?\b", "[$0]"); // Bla [C]# bla (why???)

Does anyone understand why it happens and how to match an optional # at the end?

Aximili
  • 28,626
  • 56
  • 157
  • 216

1 Answers1

5

Because \b is marking the boundaries of the word. And in regexes word is considered a sequence of alphanumeric symbols (see here), other characters not included. In first example a is a letter, so Ca is a word. In second # is not an alphanumeric character, thus word consists of only C.

To see the difference, try removing \b:

string s2 = Regex.Replace("Bla C# bla", @"C#?", "[$0]"); // Bla [C#] bla

If you need \b kind of boundary - check out this thread with some suggestions.

Community
  • 1
  • 1
Andrei
  • 55,890
  • 9
  • 87
  • 108