5

I have e.g.

string str ='Àpple';
string strNew="";
char[] A = {'À','Á','Â','Ä'};
char[] a = {'à','á','â','ä'};

I want to look through the str and see if found replace with Ascii code 'A' . So the result should be:

strNew = 'Apple';

Here is my code:

for (int i = 0; i < str.Length; i++)
{ 
    if(str[i].CompareTo(A))
       strNew += 'A'
    else if(str[i].CompareTo(a)) 
       strNew +='a'
    else
       strNew += str[i];
}

But the compare function doesn't work, so what other function I can use?

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
Benk
  • 1,284
  • 6
  • 33
  • 64
  • Look-up table and StringBuilder. Less code and faster. – Adriano Repetti Jun 19 '12 at 17:59
  • It looks like you are trying to strip diacritics. Check out [this answer](http://stackoverflow.com/a/249126/335858) for info on how to do it efficiently and reliably for all UNICODE characters, not only `A`s. – Sergey Kalinichenko Jun 19 '12 at 18:01
  • possible duplicate of [How do I remove diacritics (accents) from a string in .NET?](http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net) – arcain Jun 19 '12 at 18:09

3 Answers3

5

It sounds like you could just use:

if (A.Contains(str[i]))

but there are certainly more efficient ways of doing this. In particular, avoid string concatenation in a loop.

My guess is that there are Unicode normalization approaches which don't require you to hard-code all this data, too. I'm sure I remember one somewhere, around encoding fallbacks, but I can't put my finger on it... EDIT: I suspect it's around String.Normalize - worth a look, at least.

At the very least, this would be more efficient:

char[] mutated = new char[str.Length];
for (int i = 0; i < str.Length; i++)
{
    // You could use a local variable to avoid calling the indexer three
    // times if you really want...
    mutated[i] = A.Contains(str[i]) ? 'A'
               : a.Contains(str[i]) ? 'a'
               : str[i];
}
string strNew = new string(mutated);
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • It's a little more work than `String.Normalize` - you need to remove non-spacing characters after normalization. Here is a [link](http://stackoverflow.com/a/249126/335858) to an answer on removing diacritics. – Sergey Kalinichenko Jun 19 '12 at 18:03
  • thx Jon, can you plz tell me why is it bad to do string concatenation in a loop? – Benk Jun 19 '12 at 18:08
  • @dasblinkenlight: Yes, it's not just a single call - but that's the crux of it. There are simpler alternatives to explicitly calling `GetUnicodeCategory` on each character yourself, e.g. using an ASCII encodinging with a replacement fallback of "". – Jon Skeet Jun 19 '12 at 18:12
2

This should work:

for (int i = 0; i < str.Length; i++)
{ 
    if(A.Contains(str[i]))
        strNew += 'A'
    else if(a.Contains(str[i])) 
          strNew +='a'
    else
        strNew += str[i];
}
Samy Arous
  • 6,794
  • 13
  • 20
0

Try with a regex (first replace with "A" and then with "a":

string result = Regex.Replace("Àpple", "([ÀÁÂÄ])", "A", RegexOptions.None);

And then you can do the same for "a".

Marcel N.
  • 13,726
  • 5
  • 47
  • 72