1

I'd like to test if any of the symbols included in the regex pattern are present in my input string or not.

1. If Characters That Have Special Meaning in RegEx are Not Escaped

The problem is: if I do not escape the characters that have a special significance in regular expressions, such as the [ and the ] character, those characters are not included in the test because that nested pair of square brackets is understood by the regex engine to mean another character set to match the input against. So, something like this fails:

static void Main(string[] args)
{
    string pattern = "[[]]";
    string input = "Foobar";

    foreach(var c in pattern)
    {
        var tempInput = (input + c.ToString());
        var isMatch = Regex.IsMatch(tempInput, pattern);
        Console.WriteLine($"{c} => {tempInput} => {isMatch}\n\n");
        Console.ReadKey();
    }
}

Output:

[ => Foobar[ => False


[ => Foobar[ => False


] => Foobar] => False


] => Foobar] => False

2. If RegEx Special Characters Are Escaped

However, if I do escape these characters, then they are unrecognized character sequences and the C# compiler complains as such:

string pattern = "[\[\]]"; // Compiler error: Unrecognized escape sequence

3. C# Literal String With Unescaped Regex Characters

And if I do make the string a literal string by prepending it with the @ symbol, telling C# not to expect an escaped sequence in it like so, then none of the inputs match the regular expression since those unescaped sequences are considered as special characters by the regex engine:

static void Main(string[] args)
{
    string pattern = @"[[]]";
    ...
}

Output:

[ => Foobar[ => False


[ => Foobar[ => False


] => Foobar] => False


] => Foobar] => False

4. C# Literal String With Escaped Regex Characters

And if I make the string a literal string in C# by prepending the string with the @ symbol, and also escape the special regular expression characters, then the regex engine matches even the backward slashes (\) against the input.

static void Main(string[] args)
{
    string pattern = @"[\[\]]";  // @"[!@#$%^&*()_+-=[]{};':""\|,.<>/?]";
    ...
}

Output:

[ => Foobar[ => True


\ => Foobar\ => False


[ => Foobar[ => True


\ => Foobar\ => False


] => Foobar] => True


] => Foobar] => True

What do I do?

Water Cooler v2
  • 32,724
  • 54
  • 166
  • 336
  • The fourth solution is correct. You may also use `[][]` though to avoid having to escape `]`. Do you mean you have an issue with `@"[!@#$%^&*()_+-=[]{};':""\|,.<>/?]"`? `+-=` creates a range – Wiktor Stribiżew Oct 31 '19 at 08:35
  • @WiktorStribiżew Yes, that was my actual goal: to detect any of the symbols in the Latin chracterset in a given string. But I reduced the problem so as to ask a focused question here. I am aware of regex special characters. I escaped them too (the pluses and the minuses and the curlies and the square brackets and the parenthesis as well) but wasn't sure how to properly escape them. Thanks. I am trying a few combinations out based on your input. – Water Cooler v2 Oct 31 '19 at 08:40
  • Do not overescape, see [What special characters must be escaped in regular expressions?](https://stackoverflow.com/questions/399078/what-special-characters-must-be-escaped-in-regular-expressions) – Wiktor Stribiżew Oct 31 '19 at 08:40
  • Thanks. I am looking at the other questions you linked to. Could you please explain why `"[][]"` works? – Water Cooler v2 Oct 31 '19 at 08:43
  • 1
    `[][]` is a character class that matches either `]` (as it is the first in the character class, it does not have to be escaped) or `[` (it does not have to be escaped at all inside a character class) – Wiktor Stribiżew Oct 31 '19 at 08:44
  • Oh, I see. The outer pair of square brackets are the containers of the character set and treated as special characters but the inside ones, instead of being written as `[]` are written in reverse as `][` so they do not make an inner, nested set. Is that the explanation? Soopper! Thank you! – Water Cooler v2 Oct 31 '19 at 08:46
  • 1
    Yes, but the `[]` would not make any nested set in this case, `[[]]` is a pattern that matches `[]` substring as `[[]` matches `[` and `]` matches a `]`. Another example to match all ASCII symbols/punctuation avoiding overescaping: ``@"[!-/:-@[-^`{-~]"`` – Wiktor Stribiżew Oct 31 '19 at 08:49
  • Ah, I see. They're four ranges in there and then the only non-ajdacent one, the grave accent symbol (`) appears just before the last range, correct? – Water Cooler v2 Oct 31 '19 at 08:53
  • Yes, just have a look at the [ASCII table](http://www.asciitable.com/). – Wiktor Stribiżew Oct 31 '19 at 08:54
  • Soopper, just sooooopper! Thank you so much. :-) – Water Cooler v2 Oct 31 '19 at 08:55

0 Answers0