1

My scenario is:

  • Create an email in Outlook Express and save it as .eml file;
  • Read the file as string in C# console application;

I'm saving the .eml file encoded in utf-8. An example of text I wrote is:

  1. 'Goiânia é badalação.'

There are special characters like âéçã. It is portuguese characters. When I open the file with notepad++ the text is shown like this:

  1. 'Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.'

If I open it in outook express again, it's shown normal, like the first way. When I read the file in console application, using utf-8 decoding, the string is shown like the second way.

The code I using is:

string text = File.ReadAllText(@"C:\fromOutlook.eml", Encoding.UTF8);
Console.WriteLine(text);

I tried all Encoding options and a lot of methods I found in the web but nothing works. Can someone help me do this simple conversion?

'Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.' to 'Goiânia é badalação.'

    string text = "Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.";

    byte[] bytes = new byte[text.Length * sizeof(char)];
    System.Buffer.BlockCopy(text.ToCharArray(), 0, bytes, 0, bytes.Encoding.UTF8.GetString(bytes, 0, bytes.Length);

    char[] chars = new char[bytes.Length / sizeof(char)];
    System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
    Console.WriteLine(new string(chars));

In this utf-8 table you can see the hex. value of these characters, 'é' == 'c3 a9': http://www.utf8-chartable.de/

Thanks.

RenanStr
  • 1,208
  • 1
  • 13
  • 15

2 Answers2

1
var input = "Goi=C3=A2nia =C3=A9 badala=C3=A7=C3=A3o.";             
var buffer = new List<byte>();
var i = 0;
while(i < input.Length)
{
    var character = input[i];
    if(character == '=')
    {
        var part = input.Substring(i+1,2);
        buffer.Add(byte.Parse(part, System.Globalization.NumberStyles.HexNumber));
        i+=3;
    }
    else
    {
        buffer.Add((byte)character);
        i++;
    }
};
var output = Encoding.UTF8.GetString(buffer.ToArray());
Console.WriteLine(output); // prints: Goiânia é badalação.
trydis
  • 3,905
  • 1
  • 26
  • 31
1

Knowing the problem is quoted printable, I found a good decoder here:

http://www.dpit.co.uk/2011/09/decoding-quoted-printable-email-in-c.html

This works for me.

Thanks folks.

Update: The above link is dead, here is a workable application:

How to convert Quoted-Print String

Community
  • 1
  • 1
RenanStr
  • 1,208
  • 1
  • 13
  • 15
  • @IlliaRatkevych See update edit. I edited in the workable code: http://stackoverflow.com/questions/37540244/how-to-convert-quoted-print-string/37540375#37540375 – ib11 May 31 '16 at 23:24