3

I'm having some trouble with email encoding. I am reading an HTML file from disk and sending in through Gmail. When I open the HTML in the browser it looks great. When I copy the HTML string from Visual Studio and save it as an HTML file, it looks great. When I receive the email it contains a bunch of invalid characters. Even the list bullets are messed up! I'm sure this is an issue with encoding, but the file is encoded as UTF-8 and looks good until it's converted to RAW and sent through Gmail.

Here is the process. We read from a docx using the OpenXML SDK then we use the HtmlConverter to save the document as HTML. Later the HTML is read in from the file, converted to RAW formatting and sent through the GMail API.

Here are some relevant code snips:

This is where we save our HTML file using HtmlConverter.

HtmlConverterSettings settings = new HtmlConverterSettings()
{
    AdditionalCss = "body { margin: 1cm auto; max-width: 20cm; padding: 0; }",
    FabricateCssClasses = true,
    RestrictToSupportedLanguages = false,
    RestrictToSupportedNumberingFormats = false,
};

XElement htmlElement = HtmlConverter.ConvertToHtml( wdWordDocument, settings );
var html = new XDocument(
    new XDocumentType( "html", null, null, null ),
    htmlElement );

var htmlString = html.ToString( SaveOptions.DisableFormatting );
File.WriteAllText( destFileName.FullName, htmlString, Encoding.UTF8 );

This is where we read the stored HTMl and convert it for sending via Gmail. (We use Mimekit for the conversion.)

// Create the message using MimeKit/System.Net.Mail.MailMessage
MailMessage msg = new MailMessage();
msg.Subject = strEmailSubject; // Subject
msg.From = new MailAddress( strUserEmail ); // Sender
msg.To.Add( new MailAddress( row.email ) ); // Recipient
msg.BodyEncoding = Encoding.UTF8;
msg.IsBodyHtml = true; 

// We need to loop through our HTML Document and replace the images with a CID so that they will display inline
var vHtmlDoc = new HtmlAgilityPack.HtmlDocument();
vHtmlDoc.Load( row.file ); // Read the body, from HTML file
...
msg.Body = vHtmlDoc.DocumentNode.OuterHtml;

// Convert our System.Net.Mail.MailMessage to RAW with Base64 encoding for Gmail
MimeMessage mimeMessage = MimeMessage.CreateFromMailMessage( msg );

Google.Apis.Gmail.v1.Data.Message message = new Google.Apis.Gmail.v1.Data.Message();
message.Raw = Base64UrlEncode( mimeMessage.ToString() );
var result = vGMailService.Users.Messages.Send( message, "me" ).Execute();

And this is how we are base64 encoding:

private static string Base64UrlEncode( string input )
{
var inputBytes = System.Text.Encoding.UTF8.GetBytes( input );
// Special "url-safe" base64 encode.
return Convert.ToBase64String( inputBytes )
                  .Replace( '+', '-' )
                  .Replace( '/', '_' )
                  .Replace( "=", "" );
}

The email ends up as "Content-Type: multipart/mixed" with two alternatives. One is

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

and the other is

Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

The both the plain text and the HTML contain strings like =C3=A2=E2=82=AC=E2=84=A2 for an apostrophe and the HTML portion contains an HTML header that contains weird "3D" characters in it.

<meta charset=3D"UTF-8"><title></title><meta name=3D"Generator"=
 content=3D"PowerTools for Open XML">

None of this weirdness was in the HTML prior to converting to Base64 and sending.

Any ideas what the problem could be? Does this have anything to do with UTF8 and Mimekit?

halfer
  • 19,824
  • 17
  • 99
  • 186
mack
  • 2,715
  • 8
  • 40
  • 68
  • I can't answer your question, but it gets my upvote for effort. – adv12 May 05 '17 at 15:45
  • Why are you replacing parts of your Base64 string? I don't understand what the comment `Special "url-safe" base64 encode` means. – Equalsk May 05 '17 at 16:09
  • Did you check if the output from mimekit checks out with RFC 2822? Since that's what GMail API docs says it's needed if you work with Raw. – Alex Paven May 05 '17 at 16:12
  • @Equalsk, Base64 encoded strings aren't URL safe because they can include '+' and '/' characters. [http://stackoverflow.com/questions/13195143/range-of-valid-character-for-a-base-64-encoding/13195218] – mack May 05 '17 at 17:24
  • 1
    I doubt the API uses a URL to receive data via parameters. You can't just chop bits out of a base64 string. – Equalsk May 05 '17 at 17:56
  • Do not use MimeMessage.ToString () - you need to use MimeMessage.WriteTo(Stream) to write it to a MemoryStream or something. The ToString() method converts the entire message into a string using the iso-8859-1 encoding no matter what encoding the text parts of the message use (note: each text part can use a different charset). – jstedfast May 06 '17 at 00:45

2 Answers2

0

The answer to your question is: there is no problem. This is simply how Raw is presented, with quoted-printable encoding. This is how Gmail also presented it if you send and email and look at the source of it.

brandon927
  • 276
  • 1
  • 12
  • Thanks @brandon927 So how would I go about making the text show properly in the email? – mack May 05 '17 at 17:14
0

This is what your code should look like to get the "raw" message data for use with Google's API's:

using (var stream = new MemoryStream ()) {
    message.WriteTo (stream);

    var buffer = stream.ToArray ();
    var base64 = Convert.ToBase64String (buffer)
        .Replace( '+', '-' )
        .Replace( '/', '_' )
        .Replace( "=", "" );

    message.Raw = base64;
}

As brandon927 pointed out, the content of the text/html mime part has been quoted-printable encoded. This is a MIME encoding used for transport to make sure that it fits within the 7bit ascii range.

You will need to decode this in order to get the original HTML.

With MimeKit, this is done for you if you either use mimeMessage.HtmlBody or if you cast the MimeEntity representing the text/html part into a TextPart and access the Text property.

jstedfast
  • 35,744
  • 5
  • 97
  • 110
  • Thank you so much @jstedfast! I spent hours on this last week. I replaced the line `message.Raw = Base64UrlEncode( mimeMessage.ToString() )` with your code and now the emails show correctly in Gmail and in my email client. I would probably never have figured this out. : ) – mack May 08 '17 at 15:32