4

I need a way to convert special characters like this:

Helloæ

To normal characters. So this word would end up being Helloae. So far I have tried HttpUtility.Decode, or a method that would convert UTF8 to win1252, but nothing worked. Is there something simple and generic that would do this job?

Thank you.

EDIT

I have tried implementing those two methods using posts here on OC. Here's the methods:

public static string ConvertUTF8ToWin1252(string _source)
{
    Encoding utf8 = new UTF8Encoding();
    Encoding win1252 = Encoding.GetEncoding(1252);

    byte[] input = _source.ToUTF8ByteArray();
    byte[] output = Encoding.Convert(utf8, win1252, input);

    return win1252.GetString(output);
}

// It should be noted that this method is expecting UTF-8 input only,
// so you probably should give it a more fitting name.
private static byte[] ToUTF8ByteArray(this string _str)
{
    Encoding encoding = new UTF8Encoding();
    return encoding.GetBytes(_str);
}

But it did not worked. The string remains the same way.

hsim
  • 2,000
  • 6
  • 33
  • 69
  • possible duplicate of [How do I remove diacritics (accents) from a string in .NET?](http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net) – James Jun 28 '13 at 14:30
  • probably just implement it yourself with a function with a switch inside – Jonesopolis Jun 28 '13 at 14:31
  • @James trying out the solution from the duplicate, will tell if it works. – hsim Jun 28 '13 at 14:34
  • 1
    @James The solution does not work for the string `Helloæ`. – Dustin Kingen Jun 28 '13 at 14:36
  • Well, @James, I have tried implementing the method in the duplicate, and it does not work. – hsim Jun 28 '13 at 14:36
  • @HerveS fair enough. I can't revoke my close vote unfortunately, regardless, it is still a duplicate question. Did you try some of the other answers on the question? There were more ways than one to do it. – James Jun 28 '13 at 14:39
  • 1
    Yeah, I have tried two similar way posted in the question, but it does not work. Still looking for a way to do it, if you happen to know any, please feel free to offer. – hsim Jun 28 '13 at 14:42
  • 1
    Your comment "It should be noted that this method is expecting UTF-8 input only" does not apply since your function doesn't take a byte array but a String object as input. String objects are independent from any encoding. Once you converted a UTF8 byte array into a string, it will be same as any other string. – wborgsm Jun 28 '13 at 14:54

2 Answers2

15

See: Does .NET transliteration library exists?

UnidecodeSharpFork

Usage:

var result = "Helloæ".Unidecode();
Console.WriteLine(result) // Prints Helloae
Community
  • 1
  • 1
Dustin Kingen
  • 20,677
  • 7
  • 52
  • 92
1

There is no direct mapping between æ and ae they are completely different unicode code points. If you need to do this you'll most likely need to write a function that maps the offending code points to the strings that you desire.

Per the comments you may need to take a two stage approach to this:

  1. Remove the diacritics and combining characters per the link to the possible duplicate
  2. Map any characters left that are not combining to alternate strings
switch(badChar){
   case 'æ':
   return "ae";
   case 'ø':
   return "oe";
   // and so on
}
Mgetz
  • 5,108
  • 2
  • 33
  • 51
  • Yeah, that would be a solution. I must admit that I don't know how to do this. I'll update my post to show what I've done so far (which does not work). – hsim Jun 28 '13 at 14:31
  • `æ` is formed from the letters `ae`. – James Jun 28 '13 at 14:32
  • @James actually it's not, please look up the relevant Unicode, that is a distinct character U+00E6, as such there are no combining characters – Mgetz Jun 28 '13 at 14:35
  • I do not know how to do the second step. Can you guide me out? – hsim Jun 28 '13 at 14:39
  • Ok, I think I see your point, and comments mostly everywhere points to this solution. I think I'll implement something similar. I'll get you back if it works. – hsim Jun 28 '13 at 14:46
  • Using a switch like in this answer should be faster than the dictionary I suggested. At the end of the switch add default: return badChar; – wborgsm Jun 28 '13 at 15:00