1

I am attempting to write a script that scrapes some text off a website and then sends said text to me via email.

All of it is working as desired except for the encoding. The email contains lines such as this:

We say, ???Well, it???s all over and ruined now; what???s the

Obviously, the "???" should be apostrophes. I'm not terribly familiar with the intricacies of how encoding works especially when it pertains to email so any help would be appreciated. The pertinent part of my script is below:

msg = MIMEMultipart()
msg['From'] = fromaddr
msg['To'] = toaddrs
msg['Subject'] = "Daily Utmost Devo"

# webtext, cleanverse, & cleanlink are all <type 'unicode'> at this point

body = webtext.encode('utf-8')
bodyverse = cleanverse.encode('utf-8')
bodylink = cleanlink.encode('utf-8')
msg.attach(MIMEText(body, 'plain'))
msg.attach(MIMEText(bodyverse, 'plain'))
msg.attach(MIMEText(bodylink, 'plain'))

username = 'xxxxx@gmail.com'
password = 'xxxxx'

server = smtplib.SMTP('smtp.gmail.com:587')
server.ehlo()
server.starttls()
server.ehlo()
server.login(username, password)
text = msg.as_string()
server.sendmail(fromaddr, toaddrs, text)
server.quit()
Extinct23
  • 175
  • 3
  • 11

1 Answers1

0

MimeText takes a _charset parameter:

class email.mime.text.MIMEText(_text[, _subtype[, _charset]])

Module: email.mime.text

A subclass of MIMENonMultipart, the MIMEText class is used to create MIME objects of major type text. _text is the string for the payload. _subtype is the minor type and defaults to plain. _charset is the character set of the text and is passed as a parameter to the MIMENonMultipart constructor; it defaults to us-ascii. If _text is unicode, it is encoded using the output_charset of _charset, otherwise it is used as-is.

Changed in version 2.4: The previously deprecated _encoding argument has been removed. Content Transfer Encoding now happens implicitly based on the _charset argument.

Unless the _charset parameter is explicitly set to None, the MIMEText object created will have both a Content-Type header with a charset parameter, and a Content-Transfer-Endcoding header. This means that a subsequent set_payload call will not result in an encoded payload, even if a charset is passed in the set_payload command. You can “reset” this behavior by deleting the Content-Transfer-Encoding header, after which a set_payload call will automatically encode the new payload (and add a new Content-Transfer-Encoding header).

So try

msg.attach(MIMEText(body, 'plain', 'utf-8'))
msg.attach(MIMEText(bodyverse, 'plain', 'utf-8'))
msg.attach(MIMEText(bodylink, 'plain', 'utf-8'))

EDIT Also see these posts

MIMEText UTF-8 encode problems when sending email

Python - How to send utf-8 e-mail?

Encoding of headers in MIMEText

Community
  • 1
  • 1
Peter Gibson
  • 19,086
  • 7
  • 60
  • 64
  • Thanks! Worked perfectly. As a side note, do you know of any resources on formatting the text in within the received email (i.e. font, size, etc)? Thank you again. – Extinct23 Feb 19 '14 at 03:11
  • @Extinct23 you should look into HTML formatting. Email allows both HTML formatted and plain text version of an email to be sent at the same time. I'm not sure how Python handles this though – Peter Gibson Feb 19 '14 at 03:14