C# - Efficient search and replace char array in string

Question

I have e.g.

string str ='Àpple';
string strNew="";
char[] A = {'À','Á','Â','Ä'};
char[] a = {'à','á','â','ä'};

I want to look through the str and see if found replace with Ascii code 'A' . So the result should be:

strNew = 'Apple';

Here is my code:

for (int i = 0; i < str.Length; i++)
{ 
    if(str[i].CompareTo(A))
       strNew += 'A'
    else if(str[i].CompareTo(a)) 
       strNew +='a'
    else
       strNew += str[i];
}

But the compare function doesn't work, so what other function I can use?

It looks like you are trying to strip diacritics. Check out [this answer](http://stackoverflow.com/a/249126/335858) for info on how to do it efficiently and reliably for all UNICODE characters, not only `A`s. — Sergey Kalinichenko, Jun 19 '12 at 18:01
possible duplicate of [How do I remove diacritics (accents) from a string in .NET?](http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net) — arcain, Jun 19 '12 at 18:09

Jon Skeet · Accepted Answer · 2012-06-19T18:02:33.280

5

It sounds like you could just use:

if (A.Contains(str[i]))

but there are certainly more efficient ways of doing this. In particular, avoid string concatenation in a loop.

My guess is that there are Unicode normalization approaches which don't require you to hard-code all this data, too. I'm sure I remember one somewhere, around encoding fallbacks, but I can't put my finger on it... EDIT: I suspect it's around String.Normalize - worth a look, at least.

At the very least, this would be more efficient:

char[] mutated = new char[str.Length];
for (int i = 0; i < str.Length; i++)
{
    // You could use a local variable to avoid calling the indexer three
    // times if you really want...
    mutated[i] = A.Contains(str[i]) ? 'A'
               : a.Contains(str[i]) ? 'a'
               : str[i];
}
string strNew = new string(mutated);

edited Jun 19 '12 at 18:02

answered Jun 19 '12 at 17:57

Jon Skeet

1,421,763
867
9,128
9,194

It's a little more work than `String.Normalize` - you need to remove non-spacing characters after normalization. Here is a [link](http://stackoverflow.com/a/249126/335858) to an answer on removing diacritics. – Sergey Kalinichenko Jun 19 '12 at 18:03
thx Jon, can you plz tell me why is it bad to do string concatenation in a loop? – Benk Jun 19 '12 at 18:08
@dasblinkenlight: Yes, it's not just a single call - but that's the crux of it. There are simpler alternatives to explicitly calling `GetUnicodeCategory` on each character yourself, e.g. using an ASCII encodinging with a replacement fallback of "". – Jon Skeet Jun 19 '12 at 18:12

score 2 · Answer 2 · answered Jun 19 '12 at 17:57

2

This should work:

for (int i = 0; i < str.Length; i++)
{ 
    if(A.Contains(str[i]))
        strNew += 'A'
    else if(a.Contains(str[i])) 
          strNew +='a'
    else
        strNew += str[i];
}

answered Jun 19 '12 at 17:57

Samy Arous

6,794
13
20

Marcel N. · Answer 3 · 2012-06-19T18:24:57.170

0

Try with a regex (first replace with "A" and then with "a":

string result = Regex.Replace("Àpple", "([ÀÁÂÄ])", "A", RegexOptions.None);

And then you can do the same for "a".

edited Jun 19 '12 at 18:24

answered Jun 19 '12 at 18:06

Marcel N.

13,726
5
47
72

C# - Efficient search and replace char array in string

3 Answers3