6

I need to do something like this dreamed .trReplace:

  str = str.trReplace("áéíüñ","aeiu&");

It should change this string:

  a stríng with inválid charactérs

to:

  a string with invalid characters

My current ideas are:

 str = str.Replace("á","a").Replace("é","e").Replace("í","ï"...

and:

 sb = new StringBuilder(str)
 sb.Replace("á","a").
 sb.Replace("é","e")
 sb.Replace("í","ï"...

But I don't think they are efficient for long strings.

MiguelM
  • 197
  • 2
  • 9

4 Answers4

4

Richard has a good answer, but performance may suffer slightly on longer strings (about 25% slower than straight string replace as shown in question). I felt complelled to look in to this a little further. There are actually several good related answers already on StackOverflow as captured below:

Fastest way to remove chars from string

C# Stripping / converting one or more characters

There is also a good article on the CodeProject covering the different options.

http://www.codeproject.com/KB/string/fastestcscaseinsstringrep.aspx

To explain why the function provided in Richards answer gets slower with longer strings is due to the fact that the replacements are happening one character at a time; thus if you have large sequences of non-mapped characters, you are wasting extra cycles while re-appending together the string . As such, if you want to take a few points from the CodePlex Article you end up with a slightly modified version of Richards answer that looks like:

private static readonly Char[] ReplacementChars = new[] { 'á', 'é', 'í', 'ü', 'ñ' };
private static readonly Dictionary<Char, Char> ReplacementMappings = new Dictionary<Char, Char>
                                                               {
                                                                 { 'á', 'a'},
                                                                 { 'é', 'e'},
                                                                 { 'í', 'i'},
                                                                 { 'ü', 'u'},
                                                                 { 'ñ', '&'}
                                                               };

private static string Translate(String source)
{
  var startIndex = 0;
  var currentIndex = 0;
  var result = new StringBuilder(source.Length);

  while ((currentIndex = source.IndexOfAny(ReplacementChars, startIndex)) != -1)
  {
    result.Append(source.Substring(startIndex, currentIndex - startIndex));
    result.Append(ReplacementMappings[source[currentIndex]]);

    startIndex = currentIndex + 1;
  }

  if (startIndex == 0)
    return source;

  result.Append(source.Substring(startIndex));

  return result.ToString();
}

NOTE Not all edge cases have been tested.

NOTE Could replace ReplacementChars with ReplacementMappings.Keys.ToArray() for a slight cost.

Assuming that NOT every character is a replacement char, then this will actually run slightly faster than straigt string replacements (again about 20%).

That being said, remember when considering performance cost, what we are actually talking about... in this case... the difference between the optimized solution and original solution is about 1 second over 100,000 iterations on a 1,000 character string.

Either way, just wanted to add some information to the answers for this question.

Community
  • 1
  • 1
Chris Baxter
  • 16,083
  • 9
  • 51
  • 72
2

I did something similar for ICAO Passports. The names had to be 'transliterated'. Basically I had a Dictionary of char to char mappings.

Dictionary<char, char> mappings;

static public string Translate(string s)
{
   var t = new StringBuilder(s.Length);
   foreach (char c in s)
   {
      char to;
      if (mappings.TryGetValue(c, out to))
         t.Append(to);
      else
         t.Append(c);
    }
    return t.ToString();
 }
Chris Baxter
  • 16,083
  • 9
  • 51
  • 72
Richard Schneider
  • 34,944
  • 9
  • 57
  • 73
  • Thanks, it looks efficient to me. I'll start coding this (I'll vote you up as soon as I have enough reputation :-) – MiguelM May 30 '11 at 01:15
  • @ Richard - Made a minor bug correction. Since I had the code setup to benchmark your approach vs. the question's approach, this actually runs in the same time as the replace on short strings, and is actually slower on longer strings? Thoughts? – Chris Baxter May 30 '11 at 01:24
1

What you want is a way to go through the string once and do all the replacements. I am not not sure that regex is the best way to do it if you want efficiency. It could very well be that a case switch (for all the characters that you want to replace) in a for loop to test every character is faster. I would profile the two approaches.

soandos
  • 4,978
  • 13
  • 62
  • 96
0

It would be better to use an array of char instead of Stringbuilder. The indexer is faster than calling the Append method, because:

  • push all local variables to the stack
  • move to Append address
  • return to address
  • pop all local variables from the stack

The example below is about 20 percent faster (depends on your hardware and input string)

static Dictionary<char, char> mappings;
public static string TranslateV2(string s)
{
    var len = s.Length;
    var array = new char[len];
    char c;

    for (var index = 0; index < len; index++)
    {
        c = s[index];
        if (mappings.ContainsKey(c))
            array[index] = mappings[c];
        else
            array[index] = c;
    }

    return new string(array);
}
Stanislav
  • 459
  • 3
  • 6