2

I need to escape these characters: +-&|!(){}[]^"~*?:\ by preceding them with a \\. What is the best way to do this. My first thought was using replace, but that would search the string for each item to replace. I'm thinking there must be a way to do it with regular expressions that would get all in one pass.

Kaushik
  • 2,072
  • 1
  • 23
  • 31
aepheus
  • 7,827
  • 7
  • 36
  • 51
  • See this question: http://stackoverflow.com/questions/323640/can-i-convert-a-c-string-value-to-an-escaped-string-literal – James Johnson Oct 06 '11 at 19:55

5 Answers5

6

It's possible with a regular expression. The trickiest part is correctly escaping the special characters without getting into backslash hell:

s = Regex.Replace(s, @"[+\-&|!(){}[\]^""~*?:\\]", "\\$0");

The StringBuilder solution mentioned by Eric J. is simple and quite elegant. Here's one way to code it:

StringBuilder sb = new StringBuilder();
foreach (char c in s)
{
    if ("+-&|!(){}[]^\"~*?:\\".Contains(c))
    {
        sb.Append('\\');
    }
    sb.Append(c);
}
s = sb.ToString();
Community
  • 1
  • 1
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • If the string is at all large use the StringBuilder constructor that allows you to supply an initial size and set it to slightly larger than the original string. – Eric J. Oct 06 '11 at 20:08
4

Using a StringBuilder would probably be a better option than regex. Here is an msdn post to support the idea: Regex.Replace vs String.Replace vs StringBuilder.Replace

public const string CharsToBeEscaped = "+-&|!(){}[]^\"~*?:\\'";

string s = "+-&|!(){}[]^\"~*?:\\";

StringBuilder sb = new StringBuilder();
sb.Append( s );

for ( int i = 0; i < CharsToBeEscaped.Length; i++ ) {
    sb.Replace( CharsToBeEscaped.Substring(i,1), @"\" + CharsToBeEscaped[i] );
}
sb.Replace( @"\\", @"\" );

s = sb.ToString();
DJ Quimby
  • 3,669
  • 25
  • 35
  • +½ for suggesting StringBuilder. However, having looked at your article I think it demonstrates a slightly different approach that the code in my answer. I think it's not as efficient, but not sure. Also I think the article is difficult to read. Could you post the code you would use? Anyway I upvoted one of your answers for a different question instead. :) – Mark Byers Oct 06 '11 at 20:19
  • @MarkByers Here would be my implementation. – DJ Quimby Oct 06 '11 at 20:44
  • +1 for providing the code. I'm still a bit concerned about the performance compared to my code... I haven't tested it, but I suspect this is slower due to the repeated replacements. And also there's a bug because it first escapes special characters with backslashes and the afterwards escapes *those* backslashes with another backslash, which I don't think is what he wants. – Mark Byers Oct 06 '11 at 20:51
  • Fixed the double slash issue, but either way I think you are correct about performance, yours would take the cake. Always good to see LINQ as well since I haven't become very familiar with it yet. In any event, writing this code was good practice ;) – DJ Quimby Oct 06 '11 at 21:16
1

Strings are immutable in C#, meaning that every string.Replace() will create a new, modified copy of the original string.

For many applications that really will not matter. Since you're asking about it, though, I assume it may in your case.

The most efficient approach is probably to use a StringBuilder to build up your modified string. Loop through the source string once, and either append the character at each string position, or an escaped version, as applicable. Use the StringBuilder constructor that pre-allocates the initial internal buffer size to be slightly larger than the source string.

RegEx, which most other answers allude to, will probably also be quite efficient for this particular application and will involve less code. However, since RegEx must inherently apply generalized parsing logic, it cannot be quite as fast as a solution tuned to your specific need. Also, in some cases (probably not this one though) RegEx can be very slow. See

http://en.wikipedia.org/wiki/.NET_Framework_version_history#Common_Language_Runtime_.28CLR.29

http://www.codinghorror.com/blog/2006/01/regex-performance.html

Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • Minor quibble. Your description of the capacity of the StringBuilder isn't quite accurate. In .NET 3.5 and earlier the capacity was always 2^n. So it was anywhere between exactly equal to the source string and almost twice as large as the source string. In 4.0 the capacity is the same as the source string. (For both the min capacity = 16). – Conrad Frix Oct 06 '11 at 20:25
0

The best way to do this in surely using regular expressions (Regex) !

string str = @"+-&|!(){}[]^""~*?:\";
string pattern = @"(\+|\-|\&|\||\!|\(|\)|\{|\}|\[|\]|\^|\""|\~|\*|\?|\:|\\)";
string output = Regex.Replace(str, pattern, @"\$1");

Gives the following output :

\+\-\&\|\!\(\)\{\}\[\]\^\"\~\*\?\:\\
Arnaud F.
  • 8,252
  • 11
  • 53
  • 102
  • 1
    Why? RegEx are convenient but can be quite slow. In fact, one of the few featured improvements in .NET 4.5 is to cap the execution time of regular expressions. http://en.wikipedia.org/wiki/.NET_Framework_version_history#Common_Language_Runtime_.28CLR.29 – Eric J. Oct 06 '11 at 19:56
0

DIsclaimer: Do read the arguments in other answers about not using regex if this will cause a performance problem for your application(For example, if this is a very big string with lots of instances of your escapable characters). However, if regex is your choice the below will explain how to do it in 1 line of code.

Its Regex.Replace that you're looking for. You supply a regular expression that you're searching for, the input and a MatchEvaluator which runs for every match. In your case you just return String.Concat(@"\",match.Value).

Something like this(input is your string):

var replaced = Regex.Replace(input, //your string
         @"[\+\-&|!]", // partial regex to give you an idea
         match => String.Concat(@"\",match.Value)); //MatchEvaluator, runs for each capture
Jamiec
  • 133,658
  • 13
  • 134
  • 193