19

I use Visual Studio 2010, C# to read Gmail inbox using IMAP, it works as a charm, but I think Unicode is not fully supported as I cannot get Persian (Farsi) strings easily.

For instance I have my string: سلام, but IMAP gives me: "=?utf-8?B?2LPZhNin2YU=?=".

How can I convert it to original string? any tips from converting utf-8 to string?

shA.t
  • 16,580
  • 5
  • 54
  • 111
Ali_dotNet
  • 3,219
  • 10
  • 64
  • 115
  • 2
    You're not *really* interested in UTF-8 - you're interested in something which can handle quoted printable... – Jon Skeet May 31 '12 at 07:02
  • 2
    http://stackoverflow.com/questions/454833/system-net-mail-and-utf-8bxxxxx-headers – Nahum May 31 '12 at 07:07
  • 2
    @JonSkeet: It's Base64 (`B`), not Quoted Printable (`Q`). – Heinzi May 31 '12 at 07:08
  • 1
    Voting to reopen. Base64 is *not* the same as Quoted-Printable. – Heinzi May 31 '12 at 07:11
  • @Heinzi - the dupe and its selected answer do cover B64 (and QP). – H H May 31 '12 at 10:10
  • 1
    @HenkHolterman: Which one? The duplicate originally contained in the answer (http://stackoverflow.com/questions/2226554) is about QP (although the answer there *might* also work for B64, that's not a reason for closing as a dup), and the duplicate mentioned in the comments (http://stackoverflow.com/questions/454833) is not concerned with *decoding* the string. – Heinzi May 31 '12 at 11:17

4 Answers4

35

Let's have a look at the meaning of the MIME encoding:

=?utf-8?B?...something...?=
    ^   ^
    |   +--- The bytes are Base64 encoded
    |
    +---- The string is UTF-8 encoded

So, to decode this, take the ...something... out of your string (2LPZhNin2YU= in your case) and then

  1. reverse the Base64 encoding

    var bytes = Convert.FromBase64String("2LPZhNin2YU=");
    
  2. interpret the bytes as a UTF8 string

    var text = Encoding.UTF8.GetString(bytes);
    

text should now contain the desired result.


A description of this format can be found in Wikipedia:

Heinzi
  • 167,459
  • 57
  • 363
  • 519
3

What you have is a MIME encoded string. .NET does not include libraries for MIME decoding, but you can either implement this yourself or use a library.

Community
  • 1
  • 1
Jirka Hanika
  • 13,301
  • 3
  • 46
  • 75
3

here he is

    public static string Decode(string s)
    {
        return String.Join("", Regex.Matches(s ?? "", @"(?:=\?)([^\?]+)(?:\?B\?)([^\?]*)(?:\?=)").Cast<Match>().Select(m =>
        {
            string charset = m.Groups[1].Value;
            string data = m.Groups[2].Value;
            byte[] b = Convert.FromBase64String(data);
            return Encoding.GetEncoding(charset).GetString(b);
        }));
    }
dima_horror
  • 3,098
  • 1
  • 20
  • 24
2

The following method decodes strings like "=?utf-8?B?..." or "=?utf-8?Q?..." into a normal string. The encoding (like "utf-8") is selected automatically. Regular expressions are not used. C#

public static string DecodeQuotedPrintables(string InputText)
    {
        var ResultChars = new List<char>();
        Encoding encoding;
        for (int i= 0; i < InputText.Length; i++)
        {
            var CurrentChar = InputText[i];
            switch (CurrentChar)
            {
                case '=':
                    if((i + 1) < InputText.Length && InputText[i+1] == '?')
                    {
                        // Encoding
                        i += 2;
                        int StIndex = InputText.IndexOf('?', i);
                        int SubStringLength = StIndex - i;
                        string encodingName = InputText.Substring(i, SubStringLength);
                        encoding = Encoding.GetEncoding(encodingName);
                        i += SubStringLength + 1;

                        //Subencoding
                        StIndex = InputText.IndexOf('?', i);
                        SubStringLength = StIndex - i;
                        string SubEncoding = InputText.Substring(i, SubStringLength);
                        i += SubStringLength + 1;

                        //Text message
                        StIndex = InputText.IndexOf("?=", i);
                        SubStringLength = StIndex - i;
                        string Message = InputText.Substring(i, SubStringLength);
                        i += SubStringLength + 1;

                        // encoding
                        switch (SubEncoding)
                        {
                            case "B":
                                var base64EncodedBytes = Convert.FromBase64String(Message);
                                ResultChars.AddRange(encoding.GetString(base64EncodedBytes).ToCharArray());

                                // skip space #1
                                if ((i + 1) < InputText.Length && InputText[i + 1] == ' ')
                                {
                                    i++;
                                }
                                break;

                            case "Q":
                                var CharByteList = new List<byte>();
                                for (int j = 0; j < Message.Length; j++)
                                {
                                    var QChar = Message[j];
                                    switch (QChar)
                                    {
                                        case '=':
                                            j++;
                                            string HexString = Message.Substring(j, 2);
                                            byte CharByte = Convert.ToByte(HexString, 16);
                                            CharByteList.Add(CharByte);
                                            j += 1;
                                            break;

                                        default:
                                            // Decode charbytes #1
                                            if (CharByteList.Count > 0)
                                            {   
                                                var CharString = encoding.GetString(CharByteList.ToArray());
                                                ResultChars.AddRange(CharString.ToCharArray());
                                                CharByteList.Clear();
                                            }

                                            ResultChars.Add(QChar);
                                            break;
                                    }
                                }

                                // Decode charbytes #2
                                if (CharByteList.Count > 0)
                                {
                                    var CharString = encoding.GetString(CharByteList.ToArray());
                                    ResultChars.AddRange(CharString.ToCharArray());
                                    CharByteList.Clear();
                                }
                                
                                // skip space #2
                                if ((i + 1) < InputText.Length && InputText[i + 1] == ' ')
                                {
                                    i++;
                                }
                                break;

                            default:
                                throw new NotSupportedException($"Decode quoted printables: unsupported subencodeing: '{SubEncoding}'");
                        }
                    }
                    else
                        ResultChars.Add(CurrentChar);
                    break;

                default:
                    ResultChars.Add(CurrentChar);
                    break;
            }
        }

        return new string(ResultChars.ToArray());
    }
Alatey
  • 371
  • 2
  • 6