Consecutive control characters in Quoted Printable not decoding correctly

Question

I have a mail processing engine that reads in emails (usually UTF-8 encrypted) and processes them. I found a neat solution here for how to interpret the control characters. But that answer was given in 2011... and it seems that something has changed since then. Because the code in the referenced answer does a Regex to identify anything of the format =A0 or other hex number and decodes each character individually. But take this string:

Elke=E2=80=99s motto

I fed this into an encode/decode test site and this correctly decoded as

Elke’s motto

But that little apostrophe seems to be generated by a combination of 3 control codes. The code I have takes each code in isolation, and comes out as three separate, unreadable characters.

What code can I use to convert these special characters into the correct human-readable format?

This is "quoted printable", not UTF8. See this answer: http://stackoverflow.com/questions/3289944/utf8-quoted-printable-conversion-in-c-sharp-question/3289987#3289987 — Juderb, Aug 18 '15 at 22:14
@Juderb Indeed, my mistake. Will re-title question. But the question remains the same. The referenced answer doesn't offer any code showing how to do this. — Shaul Behr, Aug 18 '15 at 22:39
That's a perfectly valid encoding for U+2019, "Right single quotation mark". The preferred way to encode an apostrophe in typography. Your assumption that these encodings should appear as "=A0" are just wrong. It often works for simple diacritics, just not for higher codepoint values like U+2019. — Hans Passant, Aug 18 '15 at 23:09
@HansPassant I'm not arguing about the acceptability of the encoding; I'm trying to get some working code to do the decoding. Can you please refer me to some code that will do the job? — Shaul Behr, Aug 19 '15 at 06:52
This [looks like a duplicate](https://stackoverflow.com/questions/8795801/decoding-quoted-printable-message) but this question has a bounty. — Kevin Brown-Silva, Aug 22 '15 at 18:47
Does this answer your question? [C#: Class for decoding Quoted-Printable encoding?](https://stackoverflow.com/questions/2226554/c-class-for-decoding-quoted-printable-encoding) — Jon Schneider, Mar 19 '20 at 18:19

score 2 · Accepted Answer · edited May 23 '17 at 11:52

Here is a piece of code I found on SO looking for quoted printable :

private static string Decode(string input, string bodycharset)
{
    var i = 0;
    var output = new List<byte>();
    while (i < input.Length)
    {
        if (input[i] == '=' && input[i + 1] == '\r' && input[i + 2] == '\n')
        {
            //Skip
            i += 3;
        }
        else if (input[i] == '=')
        {
            string sHex = input;
            sHex = sHex.Substring(i + 1, 2);
            int hex = Convert.ToInt32(sHex, 16);
            byte b = Convert.ToByte(hex);
            output.Add(b);
            i += 3;
        }
        else
        {
            output.Add((byte)input[i]);
            i++;
        }
    }
    if (String.IsNullOrEmpty(bodycharset))
        return Encoding.UTF8.GetString(output.ToArray());
    else
        return Encoding.GetEncoding(bodycharset).GetString(output.ToArray());
}

Source : Decoding Quoted printable message

Decode("Elke=E2=80=99s motto", "utf-8") -> Elke’s motto

Consecutive control characters in Quoted Printable not decoding correctly

1 Answers1

Linked