1

My string contains what could be perceived as regex.

var x = "a&\b";

I am trying to replace out any non-numeric or alphabetical character.

var z = Regex.Replace(x, "[^a-zA-Z0-9 -]", "", RegexOptions.IgnoreCase);

Expected result: ab
Actual result: a

I understand that \b is a regular expression word.

I also understand I can var x = @"a&\b";, however, I wish to escape the variable, not the assignment.

how can I escape my variablex?

I have tried Regex.Escape()

Valamas
  • 24,169
  • 25
  • 107
  • 177
  • 1
    `\b` is ASCII [backspace](http://en.wikipedia.org/wiki/Backspace_character) (0x08) character, not a [regular expression for word boundary](http://msdn.microsoft.com/en-us/library/az24scfc%28v=vs.110%29.aspx#atomic_zerowidth_assertions). It is not clear what is your exact goal... Maybe `"a&\b".Replace("\b", "\\b")`? – Alexei Levenkov May 03 '14 at 01:58
  • The string x can contain any value. Is that a safe enough escape? Do I just perform that replace and also use Regex.Escape()? – Valamas May 03 '14 at 02:08
  • Can you please edit your question to match accepted answer? Your "expected `ab`" is very confusing... – Alexei Levenkov May 03 '14 at 02:52
  • My apologies. After using Dereks solution, I thought i saw `ab`. I have unticked as answer. – Valamas May 03 '14 at 05:45

2 Answers2

1

The initial regular expression would work - if the String contained what was expected.

This is because \ in a String Literal (except for a Verbatim String Literal) is the escape character. While this is mentioned in the question, the fundamental premise in the question is wrong and it has nothing to do with "\b is a regular expression word" because the string in question is not used as the regular expression pattern.

Literal  ->  actual String data
"a&\b"       {'a', '&', BELL}
"a&\\b"      {'a', '&', '\', 'b'}
@"a&\b"      {'a', '&', '\', 'b'}

As such, it is the original string which does not contain a 'b' - but rather the BELL character - which is removed due to not being accepted by the original regular expression replacement. BELL is, after all, not an alphanumeric character. Even if it wasn't removed, it would not display as a 'b' character .. because it's BELL.

While there is no generalized way in the .NET standard library1 to reverse-escape from "\b" to "\\b"/@"\b", you may find this transformation function useful - then you could go x = EscapeLikeALiteral("a&\b"), after which x == "a&\\b", and obtain the desired "ab" result, even with the original regular expression2.


1 The Regex.Escape/Regex.Unescape methods are only suitable for use with regular expression patterns and not this generalized task of "reverse escaping strings to literals".

2 Strictly speaking, the original regular expression is not an alphanumeric filter as it also allows spaces and dashes.

Community
  • 1
  • 1
user2864740
  • 60,010
  • 15
  • 145
  • 220
0

Instead of your code how about using \W?

\w Matches any word character [a-zA-Z_0-9]

\W Matches any non-word character [^a-zA-Z_0-9]

So I am suggesting you use:

var z = Regex.Replace(x, "\W", "", RegexOptions.IgnoreCase);

You might be able to use:

var z = Regex.Replace(x, "[^a-zA-Z_0-9]", "", RegexOptions.IgnoreCase);

But I think the first one is nicer.

Derek
  • 7,615
  • 5
  • 33
  • 58
  • 1
    How does this reproduce "ab" ? – hwnd May 03 '14 at 02:42
  • The original regular expression would work just as well in this case - it also has a slightly difference meaning of "alphanumeric". While these are approximate alternatives, neither answers the question or address the fundamental issue. – user2864740 May 03 '14 at 04:19