-2

To begin, I would like to note that a similar question exists with answers and workarounds specific to PHP. I am seeing this issue in C# and I would like to understand the logic behind this apparent "gotcha".

The word boundary character \b doesn't seem to work properly when placed inside a Regex set (aka "box brackets": []). Is this a syntactic issue, are word boundaries intentionally excluded from set matching, or is there some other explanation I'm missing?

Here is a program to demonstrate the issue:

namespace TestProgram
{
    using System.Text.RegularExpressions;
    using System.Diagnostics;
    class Program
    {
        static void Main(string[] args)
        {
            var text = "[abc]";
            var BaselineRegex = new Regex(@"(?:\b)(abc)");
            Debug.Assert(BaselineRegex.IsMatch(text)); // Assertion Passes
            var BracketRegex = new Regex(@"(?:[\b])(abc)");
            Debug.Assert(BracketRegex.IsMatch(text)); // Assertion Fails!
        }
    }
}

Here are web versions to demonstrate as well:

  • Word boundary performing as expected without brackets: (link)

  • Word boundary failing to match when placed inside brackets: (link)

Jake
  • 7,565
  • 6
  • 55
  • 68
  • 5
    `[\b]` is a backspace char matching pattern, that is all. All zero-width assertions lose their meanings of zero-width assertions inside character classes. – Wiktor Stribiżew May 22 '19 at 22:42
  • 1
    Because there is no concrete, consumable assertion you can put in a character class, because, well, they're not characters. –  May 22 '19 at 22:43
  • Note how `^` and `$` do not function as beginning/end of string anchors within square brackets either. Character classes must at least be quantifiable. – CAustin May 22 '19 at 23:08
  • A better question may be what is your true goal to have an anchor in a character set? I would surmise it is more how you setup your pattern that is the issue, and changing how you search might be more fruitful. – ΩmegaMan May 22 '19 at 23:50

1 Answers1

-1

To quote Wiktor Stribiżew's comment:

[\b] is a backspace char matching pattern, that is all.

So while \b is a zero-width word boundary outside of a character class, it refers to the backspace character (0x8 in ASCII) when used within a character class. Further details are provided in this post.

Wiktor: If you would like to post your own answer I would be happy to accept it over this one.

Jake
  • 7,565
  • 6
  • 55
  • 68