Why does this Python program send empty emails when I encode it with utf-8?

Question

Before encoding the msg variable, I was getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xfc' in position 4: ordinal not in range(128)

So I did some research, and finally encoded the variable:

msg = (os.path.splitext(base)[0] + ': ' + text).encode('utf-8')
server.sendmail('...@gmail.com', '...@gmail.com', msg)

Here's the rest of the code on request:

def remind_me(path, time, day_freq):

for filename in glob.glob(os.path.join(path, '*.docx')):
    # file_count = sum(len(files))
    # random_file = random.randint(0, file_number-1)
    doc = docx.Document(filename)
    p_number = len(doc.paragraphs)

    text = ''
    while text == '':
        rp = random.randint(0, p_number-1) # random paragraph number
        text = doc.paragraphs[rp].text # gives the entire text in the paragraph

    base = os.path.basename(filename)
    print(os.path.splitext(base)[0] + ': ' + text)
    server = smtplib.SMTP('smtp.gmail.com', 587)
    server.starttls()
    server.login('...@gmail.com', 'password')
    msg = (os.path.splitext(base)[0] + ': ' + text).encode('utf-8')
    server.sendmail('...@gmail.com', '...@gmail.com', msg)
    server.quit()

Now, it sends empty emails instead of delivering the message. Does it return None? If so, why?

Note: Word documents contain some characters like ş, ö, ğ, ç.

Did you check/print `msg` before sending it? Is it an empty string by chance? — DYZ, Feb 01 '18 at 04:58
I don't see how `msg` could ever end up containing a valid email message with this code. What's in `text` and why do you attempt to encode it in `utf-8` in the first place? — tripleee, Feb 01 '18 at 05:52
@DYZ I checked msg variable by printing it. It's working just fine :( — Ali, Feb 01 '18 at 16:52
@tripleee msg variable contains some text from the word documents I have. When I don't encode it, it gives me aforementioned UnicodeEncodeError. — Ali, Feb 01 '18 at 16:52
That doesn't help *at all.* What does it contain? Apparently nothing remotely like a well-formed email message? — tripleee, Feb 01 '18 at 17:01
@tripleee they contain characters like ş,ö,ğ,ç. I thought this was the problem. — Ali, Feb 01 '18 at 17:03

tripleee · Accepted Answer · 2022-07-15T08:09:09.393

The msg argument to smtplib.sendmail should be a bytes sequence containing a valid RFC5322 message. Taking a string and encoding it as UTF-8 is very unlikely to produce one (if it's already ASCII, encoding it does nothing useful; and if it isn't, you are most probably Doing It Wrong).

To explain why that is unlikely to work, let me provide a bit of background. The way to transport non-ASCII strings in MIME messages depends on the context of the string in the message structure. Here is a simple message with the word "Hëlló" embedded in three different contexts which require different encodings, none of which accept raw UTF-8 easily.

From: me <sender@example.org>
To: you <recipient@example.net>
Subject: =?utf-8?Q?H=C3=ABll=C3=B3?= (RFC2047 encoding)
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="fooo"

--fooo
Content-type: text/plain; charset="utf-8"
Content-transfer-encoding: quoted-printable

H=C3=ABll=C3=B3 is bare quoted-printable (RFC2045),
like what you see in the Subject header but without
the RFC2047 wrapping.

--fooo
Content-type: application/octet-stream; filename*=UTF-8''H%C3%ABll%C3%B3

This is a file whose name has been RFC2231-encoded.

--fooo--

There are recent extensions which allow for parts of messages between conforming systems to contain bare UTF-8 (even in the headers!) but I have a strong suspicion that this is not the scenario you are in. Maybe tangentially see also https://en.wikipedia.org/wiki/Unicode_and_email

Returning to your code, I suppose it could work if base is coincidentally also the name of a header you want to add to the start of the message, and text contains a string with the rest of the message. You are not showing enough of your code to reason intelligently about this, but it seems highly unlikely. And if text already contains a valid MIME message, encoding it as UTF-8 should not be necessary or useful (but it clearly doesn't, as you get the encoding error).

Let's suppose base contains Subject and text is defined thusly:

text='''=?utf-8?B?H=C3=ABll=C3=B3?= (RFC2047 encoding)
MIME-Version: 1.0
Content-type: multipart/mixed; boundary="fooo"
....'''

Now, the concatenation base + ': ' + text actually produces a message similar to the one above (though I reordered some headers to put Subject: first for this scenario) but again, I imagine this is not how things actually are in your code.

If your goal is to send an extracted piece of text as the body of an email message, the way to do that is roughly

from email.message import EmailMessage

body_text = os.path.splitext(base)[0] + ': ' + text

message = EmailMessage()
message.set_content(body_text)
message["subject"] = "Extracted text"
message["from"] = "you@example.net"
message["to"] = "me@example.org"

with smtplib.SMTP("smtp.gmail.com", 587) as server:
    # ... smtplib setup, login, authenticate?
    server.send_message(message)

This answer was updated for the current email library API; the text below the line is the earlier code from the original answer.

The modern Python 3.3+ EmailMessage API rather straightforwardly translates into human concepts, unlike the older API which required you to understand many nitty-gritty details of how the MIME structure of your message should look.

from email.mime.text import MIMEText

body_text = os.path.splitext(base)[0] + ": " + text
sender = "you@example.net"
recipient = "me@example.org"

message = MIMEText(body_text)
message["subject"] = "Extracted text"
message["from"] = sender
message["to"] = recipient
server = smtplib.SMTP("smtp.gmail.com", 587)
# ... smtplib setup, login, authenticate?
server.sendmail(from, to, message.as_string())

The MIMEText() invocation builds an email object with room for a sender, a subject, a list of recipients, and a body; its as_text() method returns a representation which looks roughly similar to the ad hoc example message above (though simpler still, with no multipart structure) which is suitable for transmitting over SMTP. It transparently takes care of putting in the correct character set and applying suitable content-transfer encodings for non-ASCII header elements and body parts (payloads).

Python's standard library contains fairly low-level functions so you have to know a fair bit in order to connect all the pieces correctly. There are third-party libraries which hide some of this nitty-gritty; but you would exepect anything with email to have at the very least both a subject and a body, as well as of course a sender and recipients.

I'll acknowledge that this doesn't strictly speaking answer your question, because we don't know enough about your variables; but at least I hope it should help you [edit] your question into something we can reason about. — tripleee, Feb 01 '18 at 06:04
Thank you, I've just edited it. Could you check it again? I hope it would be easier to understand the situation now. — Ali, Feb 01 '18 at 17:00
Not sure I'm guessing your intent correctly but see update now. — tripleee, Feb 01 '18 at 17:18
Good comprehensive answer! But it lacks information about what encoding should be used in the names (not addresses!) of recipients (and sender), so could you possibly add that? For example, if the recipient is "Mr Åäö ", how would this be encoded? I'm currently having a problem with this (see here: https://stackoverflow.com/questions/58253420/how-to-encode-international-characters-in-recipient-names-not-addresses-with-s ), and I cannot seem to find any information about this online, and interestingly also, no one is answering my question about this for some reason?!? — QuestionOverflow, Oct 06 '19 at 22:15
Sounds like you are looking for RFC2047. But sure, I'll take a look at your question. The proper way to attract attention to an unanswered question is to post a bounty, though. — tripleee, Oct 07 '19 at 05:27

Why does this Python program send empty emails when I encode it with utf-8?

1 Answers1

Linked

Related