My scenario is:
- Create an email in Outlook Express and save it as .eml file;
- Read the file as string in C# console application;
I'm saving the .eml file encoded in utf-8. An example of text I wrote is:
- 'Goiânia é badalação.'
There are special characters like âéçã. It is portuguese characters. When I open the file with notepad++ the text is shown like this:
- 'Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.'
If I open it in outook express again, it's shown normal, like the first way. When I read the file in console application, using utf-8 decoding, the string is shown like the second way.
The code I using is:
string text = File.ReadAllText(@"C:\fromOutlook.eml", Encoding.UTF8);
Console.WriteLine(text);
I tried all Encoding options and a lot of methods I found in the web but nothing works. Can someone help me do this simple conversion?
'Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.' to 'Goiânia é badalação.'
string text = "Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.";
byte[] bytes = new byte[text.Length * sizeof(char)];
System.Buffer.BlockCopy(text.ToCharArray(), 0, bytes, 0, bytes.Encoding.UTF8.GetString(bytes, 0, bytes.Length);
char[] chars = new char[bytes.Length / sizeof(char)];
System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
Console.WriteLine(new string(chars));
In this utf-8 table you can see the hex. value of these characters, 'é' == 'c3 a9': http://www.utf8-chartable.de/
Thanks.