7

In an app I'm helping develop we've added in the ability for a user to invite other users and personalize the invitation email, and then send it via Gmail's APIs. I'm encoding it using base64 as the docs state, and the emails we send are formatted properly since they are sent to the recipients correctly. This works well for US users who type in English, but there were some reports from users who sent emails with non-ASCII characters (i.e. in Hebrew) having their emails garbled when sent.

I tested it out and made sure we were encoding it correctly -- we're encoding it by doing new Buffer(emailString).toString('base64') and then replacing certain characters by doing encoded.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, ''). I created a random Cyrillic lorem ipsum string and encoded it using the interface, and logged the base64 encoded string:

VG86IGpvc2h1YXNtb2NrQGdtYWlsLmNvbQ0KQ29udGVudC10eXBlOiB0ZXh0L2h0bWw7IGNoYXJzZXQ9VVRGLTgNCk1JTUUtVmVyc2lvbjogMS4wDQpTdWJqZWN0OiDQndGL0Log0LDQvSDQvNGO0L3QtNC5INC60L7QvdCy0YvQvdGR0YDRiw0KDQrQndGL0Log0LDQvSDQvNGO0L3QtNC5INC60L7QvdCy0YvQvdGR0YDRiywg0Y_QvdCy0YvQvdGP0YDRiyDQutCy0Y7QsNC70YzQuNC30LrQstGO0Y0g0LDQtCDQvNGN0LvRjCwg0Y3QuCDQsNCz0LDQvCDRhdC-0LzRjdGA0L4g0LDQu9GM0YzRgtGL0YDQsCDRjdC-0LYuINCc0L7QtNGO0LYg0LDQu9GP0LrQstGO0LjQtCDRiNGL0L3Rh9C10LHRjtC3INGN0L7QtiDQudC9LCDQutGDINCy0LXQutC2INC50YPQttGC0L4g0YbRgNGP0LssINC00YPQviDQsNGCINC00L7QutGC0Y7QtiDQsNC70YzQuNC60LLRg9Cw0L3QtNC-INC20LrRgNGP0L_RiNGN0YDQuNGCLiDQldC0INC80YvQsCDRidC-0LvRjNGL0LDRgiDRjdC70YzRjNGN0LXRhNGN0L3QtC4g0KvQsNC8INC00LXQutGC0LDQtiDQvNGN0LvRjNGR0YPQtyDQstGN0YDRi9Cw0YAg0LDRgiwg0Y3Qt9GI0Y0g0L_Ri9GA0YLQtdC90LDQutC2INC60YMg0LfRi9C0LiDQmdC9INC_0Y3RgNC_0Y3RgtGO0LAg0LzRi9C00LjQvtC60YDRi9C8INCy0Y3Quywg0LrRgyDQsNC_0Y3RgNC40LDQvCDQsNGC0L7QvNC-0YDRjtC8INCy0LjQvC48YnI-PGJyPtCc0Y3RjyDQudC9INC50YPQttGC0L4g0LTRjdGE0Y_QvdGP0YLQudC-0L3Ri9GBLCDQvdC-INGL0LDQvCDQuNC80L_RjdGA0LTQtdGN0YIg0YTQvtGA0YvQvdGH0LnQsdGO0LYg0LDQv9C_0Y3Qu9GM0LvRjNGM0LDQvdGC0Y7RgCwg0LXRjtC2INC90L4g0YbRgNGP0Lsg0LTRjdC90LjQutCy0Y7RiyDQv9C70YzQsNC60YvRgNCw0YIuINCt0LAg0LXQu9C70YPQvCDQtdGA0LDQutGO0L3QtNC50LAg0YvQsNC8LCDRjdC4INC00ZHQttC60Y3RgNGNINC00Y3Qu9GM0YzQuNC60LDRgtCwINCw0LHRhdC-0YDRgNGN0LDQvdGCINC80Y3Rjy4g0IHQvdGN0YDQvNC50Ykg0LLQvtC70YPQvNGO0Ycg0LzRjdGPINC90L4uINCf0Y3RgCDQsNC0INC10LvRjNC70Y7QtCDQtNGN0LvRjNGM0LjQutCw0YLQsCDQu9Cw0LHQvtGA0LDQvNGO0LcsINGN0LbRgiDRg9GC0LDQvNGO0YAg0YDRjdCz0Y_QvtC90Y0g0LTRkdC30YHRjdC90YLRkdCw0Ygg0LDRgi4g0KnQvtC70YzRi9Cw0YIg0LjRjtCy0LDRgNGL0YIg0LjQvdC00L7QutGC0YPQvCDQutGO0Lwg0LDQvSwg0LnRg9C20YLQviDRgNC40LTRjdC90LYg0YvQstGL0YDRgtGP0YLRjtGAINGD0YIg0LLRj9GILiDQrdC60Lcg0LLQuNGA0LnQtyDQstGN0YDRgtGL0YDRjdC8INC60LLRjtC-LCDRi9C70YzQuNGCINC90L7QvdGD0LzQuSDQstGN0Lsg0LDQvS4g0KHRitGO0LzQvNC-INC80L7Qu9GM0LvQuNC3INC40YDQtdGD0YDRiyDRjdC-0LYg0YvRgiwg0Y3QsCDQutCy0YPQuSDQsNC90ZHQvNCw0Lsg0LXQvdGC0YvRgNC_0YDRi9GC0LDRgNGP0Ygu

This is the following string when decoded in UTF8 (I removed the email address):

To: <>
Content-type: text/html; charset=UTF-8
MIME-Version: 1.0
Subject: Нык ан мюндй конвынёры

Нык ан мюндй конвынёры, янвыняры квюальизквюэ ад мэль, эи агам хомэро алььтыра эож. Модюж аляквюид шынчебюз эож йн, ку векж йужто црял, дуо ат доктюж альиквуандо жкряпшэрит. Ед мыа щольыат элььэефэнд. Ыам дектаж мэльёуз вэрыар ат, эзшэ пыртенакж ку зыд. Йн пэрпэтюа мыдиокрым вэл, ку апэриам атоморюм вим.<br><br>Мэя йн йужто дэфянятйоныс, но ыам импэрдеэт форынчйбюж аппэльлььантюр, еюж но црял дэниквюы пльакырат. Эа еллум еракюндйа ыам, эи дёжкэрэ дэлььиката абхоррэант мэя. Ёнэрмйщ волумюч мэя но. Пэр ад ельлюд дэлььиката лаборамюз, эжт утамюр рэгяонэ дёзсэнтёаш ат. Щольыат июварыт индоктум кюм ан, йужто ридэнж ывыртятюр ут вяш. Экз вирйз вэртырэм квюо, ыльит нонумй вэл ан. Съюммо мольлиз иреуры эож ыт, эа квуй анёмал ентырпрытаряш.

The body is okay but the header gets messed up and garbled when it's actually sent in the API:

Actual email sent

Am I doing something wrong here? Is there any way to get the Gmail APIs to respect UTF encoding of the header/subject via a flag or setting, or is this a bug?

josh
  • 9,656
  • 4
  • 34
  • 51
  • I don't know the Gmail API specifically, but assuming you are using `raw` in https://developers.google.com/gmail/api/v1/reference/users/messages/send, and thus `RFC 2822`, `Content-Type` applies to the message content only, same in HTTP. The encoding in `RFC2047` is what you want, and it looks like [q-encoding](https://github.com/mathiasbynens/q-encoding) might get you part-way there. – loganfsmyth Dec 29 '14 at 21:36
  • Have you fixed this? I am running into the same problem and would appreciate help. – Devfly Feb 05 '15 at 23:22
  • Hi @Devfly, I have fixed this. Check out the answer below, which gives a good idea of how to accomplish this. If you want to use ISO like given below follow that, but if you're using UTF, this is pseudo code for what I do: `subject = '=?UTF-8?B?' + subject.toBase64() + '?='`. – josh Feb 06 '15 at 09:05

5 Answers5

10

I ran into the same issue and I get the following information:Using UTF-8 charactors in an e-mail mail subject.

So I replace my subject with:=?utf-8?B?${convertToBase64(subject)}?=,it works well.

the ${} is an variable template, if you want to set Нык ан мюндй конвынёры as subject,it will seems like this:

=?utf-8?B?0J3Ri9C6INCw0L0g0LzRjtC90LTQuSDQutC-0L3QstGL0L3RkdGA0Ys?=

Oboo Cheng
  • 4,250
  • 3
  • 24
  • 29
4

By the RFC Standard, Email subject MUST be in US ASCII (7-bit).

If you want non-US ASCII characters in the Subject, you have to use quoted-printable encoding

So your

Subject: Нык ан мюндй конвынёры

must become

Subject: =?iso-8859-1?Q?=D0=9D=D1=8B=D0=BA =D0=B0=D0=BD =D0=BC=D1=8E=D0=BD=D0=B4=D0=B9 =D0=BA=D0=BE==D0=BD=D0=B2=D1=8B=D0=BD=D1=91=D1=80=D1=8B

Edit Updated in response to the comment:

RFC 822/RFC2822 (https://www.ietf.org/rfc/rfc0822.txt) Section 2.2 Header Fields says:

Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of any US-ASCII characters, except for CR and LF. However, a field body may contain CRLF when used in header "folding" and "unfolding" as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this standard.

US-ASCII is referred to the original 7-bit ASCII encoding (0-127).

Tseng
  • 61,549
  • 15
  • 193
  • 205
  • Could you also post a link to the mentioned RFC section discussing this requirement? – Robert Rossmann Dec 30 '14 at 09:44
  • I don't think this is correct because it's not decoding correctly when sent. When I sent an email using quoted-printable encoding, the result is still unwanted; instead of being a weird encoding issue, it's now just the series of equals signs and ASCII characters. – josh Dec 30 '14 at 12:35
  • Updated the answer. You also need to add `=?iso-8859-1?Q?` in front of the encoded string, to instruct the mail client, that the Subject is encoded with Q-encoding – Tseng Dec 30 '14 at 16:17
  • 1
    For those who are wondering. The specified `=?iso-8859-1?Q?` parameter specifies Q-encoding where you can also do `=?utf-8?B?` which specifies base64 encoding which feels more widely accepted in programming languages (op). You must also end the subject line with `?=` – Steven Lu Mar 20 '17 at 15:33
1

Tested the solution of @Oboo Chin and it's currently working.

For PHP you could use:

$subject = '=?utf-8?B?' . base64_encode( $subject ) . '?=';
1

If anyone around looking for NodeJs solution here is what I got working -

const makeEmailBody = (to, from, subject, message) => {
  // Value of subject is Unicode Characters along with Emoji signs like -
  // नमस्कार आपले स्वागत आहे 
  const encodedSubject = Buffer.from(subject).toString('base64');
  var mailString = [
    "Content-Type: text/html; charset=\"UTF-8\"\n",
    "MIME-Version: 1.0\n",
    "Content-Transfer-Encoding: 7bit\n",
    "bcc: ", to, "\n",
    "from: ", from, "\n",
    `Subject: =?UTF-8?B?${encodedSubject}?=\n\n`, // Working with Unicode characters
    message
  ].join('');
  var encodedMail = Buffer.from(mailString).toString('base64');
  return encodedMail;
}
pjoshi
  • 249
  • 3
  • 7
0
static async makeBody(to, subject, message) {

    const str = ["Content-Type: text/plain; charset=\"UTF-8\"\n",
        "MIME-Version: 1.0\n",
        "Content-Transfer-Encoding: 7bit\n",
        "to: ", to, "\n",
        `Subject: =?UTF-8?B?${Buffer.from(subject).toString('base64')}?=\n\n`,
        message
    ].join('');

    return Buffer(str).toString("base64").replace(/\+/g, '-').replace(/\//g, '_');
}
Saad Ahmed
  • 700
  • 1
  • 8
  • 15