I want to send email messages that have arbitrary unicode bodies in a Python 3.2 program. But, in reality, these messages will consist largely of 7bit ASCII text. So I would like the messages encoded in utf-8 using quoted-printable. So far, I've found this works, but it seems wrong:
c = email.charset.Charset('utf-8')
c.body_encoding = email.charset.QP
m = email.message.Message()
m.set_payload("My message with an '\u05d0' in it.".encode('utf-8').decode('iso8859-1'), c)
This results in an email message with exactly the right content:
To: someone@example.com
From: someone_else@example.com
Subject: This is a subjective subject.
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
My message with an '=D7=90' in it.
In particular b'\xd7\x90'.decode('utf-8')
results in the original Unicode character. So the quoted-printable
encoding is properly rendering the utf-8
. I'm well-aware that this is an incredibly ugly hack. But it works.
This is Python 3. Text strings are expected to always be unicode. I shouldn't have to decode it to utf-8. And then turning it from bytes
back into str
by .decode('iso8859-1')
is a horrible hack, and I shouldn't have to do that either.
It the email
module just broken with respect to encodings? Am I not getting something?
I've attempted to just plain old set it, with no character set. That leaves me with a unicode email message, and that's not right at all. I've also tried leaving off the encode
and decode
steps. If I leave them both off, it complains that the \u05d0
is out-of-range when trying to decide if that character needs to be quoted in the quoted-printable encoding. If I leave in just the encode
step, it complains bitterly about how I'm passing in a bytes
and it wants a str
.