1

The application uses the get_payload() method to retrieve the HTML of the message. The problem is that the retrieved HTML consists of random sequences of \r, \t and \n. Basically, the HTML does not match between the Gmail and my application.

I carefully looked at html from both Gmail and my application. The Gmail one has a <td height="32"></td> tag and nothing in it while my application has I guess just a string of useless characters like in the image below. Instead of those characters in the email, there is just blank space or nothing. Any idea why I am getting this?

Note: This happens in other emails, even with just an email with plain text.

enter image description here

The following is the code I am using in Python

import email
import email.header
import datetime
import imaplib
import sys
from pprint import pprint

imap_host = 'imap.gmail.com'
imap_user = 'someEmail@gmail.com'
imap_pass = 'somePassword'

diction = []


def process_mailbox(m):

    rv, data = m.search(None, "ALL")
    if rv != 'OK':
        print('No messages found!')
        return

    for num in data[0].split():
        rv, data = m.fetch(num, '(RFC822)')
        if rv != 'OK':
            print("ERROR getting message", num)
            return

        msg = email.message_from_bytes(data[0][1])
        hdr = email.header.make_header(email.header.decode_header(msg['Subject']))
        subject = str(hdr)
        print('Message %s: %s' % (num, subject))

        # date_tuple = email.utils.parsedate_tz(msg['Date'])
        # if date_tuple:
        #   local_date = datetime.datetime.fromtimestamp(email.utils.mktime_tz(date_tuple))
        #   print('Local Date:', local_date.strftime('%a, %d %b %Y %H:%m:%S'))
        for part in msg.walk():
            if part.get_content_type() == 'text/html':
                # print(part.get_payload(decode=True))
                diction.append({'body': part.get_payload(decode=True)})
    return diction


M = imaplib.IMAP4_SSL('imap.gmail.com')

try:
    rv, data = M.login(imap_user, imap_pass)
except imaplib.IMAP4.error:
    print("LOGIN FAILED!")
    sys.exit(1)

# print(rv, data)

rv, mailboxes = M.list()
if rv == 'OK':
    print('Mailboxes:')
    print(mailboxes)

rv, data = M.select('Inbox')
if rv == 'OK':
    print('Processing mailbox...\n')
    process_mailbox(M)
    M.close()
else:
    print('ERROR: Unable to open mailbox', rv)
    M.logout()

Here is the flask code:

from flask import Flask, render_template, url_for
from forms import RegistrationForm, LoginForm

import email_client


a = email_client.diction

app = Flask(__name__)


@app.route('/test')
def test():
    return render_template('test.html', text=a)


@app.route('/')
@app.route('/email')
def home():
    return render_template('home.html')


@app.route('/about')
def about():
    return render_template('about.html', title='About')


@app.route('/register')
def register():
    form = RegistrationForm()
    return render_template('register.html', title='Register', form=form)


if __name__ == '__main__':
    app.run(debug=True)

And the HTML:

{% for t  in text %}
<div class="card content-section">
    <div class="card-body">
        {{ t.body |safe}}
    </div>
</div>
{% endfor %}

Edit:

I added Markup import, and changed the the for loop that reads the body of the message to:

        for part in msg.walk():
        if part.get_content_type() == 'text/html':
            value = Markup(part.get_payload(decode=True))
            print(value)
            diction.append({'body': value})
  • Please post code snippets Output Won't works. – Gaurav Dec 31 '18 at 03:00
  • 1
    Where is the Flask part that renders the e-mail? – rfkortekaas Dec 31 '18 at 08:04
  • 1
    I think you need to encode it to some format, before sending there. You are now sending it as String and its utf-8 format, But i Guess imap uses bytes or some other character encoding. – Gaurav Dec 31 '18 at 10:09
  • @rfkortekaas I will include it into the edit, I don't think that's where the problem is. But the bad HTML is already received in this python code. If I print part.get_payload(decode=True), it will print the bad HTML. – Elonas Marcauskas Dec 31 '18 at 14:21
  • @Gaurav I am already using part.get_payload(decode=True). I had issues before because I did not use decode=True before. So if I were to encode it, should I try to encode it into HTML? – Elonas Marcauskas Dec 31 '18 at 14:23
  • I am not sure actually, because i hardly can see flask part in your code, why don't you use flask mail ? – Gaurav Dec 31 '18 at 14:29
  • @Gaurav As far as I understand flask mail is for sending only (As it is described https://pythonhosted.org/Flask-Mail/). The flask code is simple, I am just passing a list to the HTML file and for looping it. I need to retrieve emails from Gmail. Or am I unaware of what Flask-Mail can do? – Elonas Marcauskas Dec 31 '18 at 14:56
  • Yes you are right, let me try at my end – Gaurav Dec 31 '18 at 15:04
  • Have You tried without decode=True, and i guess to render it to the html you need string not byte , because decode=True convert string to byte – Gaurav Dec 31 '18 at 15:35
  • @Gaurav I see. Just checked the data types by using the built in Python type() method. Using decode=True gives bytes and decode=False gives string. – Elonas Marcauskas Dec 31 '18 at 15:51
  • Yes, so just try without decode=True – Gaurav Dec 31 '18 at 15:53
  • If it solves then let me know. – Gaurav Dec 31 '18 at 16:00
  • I tried it but it kind of sets me back. All the graphical stuff is gone when I remove decode=True. I asked for help in this thread https://stackoverflow.com/questions/53966633/flask-imap-application-giving-incorrect-html So if I not coverting the string into a byte and leaving it as a string I still need to covert it to HTML. Any clue how to do that? I am searching how to do that right now. – Elonas Marcauskas Dec 31 '18 at 16:06
  • So now I am wondering whether this is a problem of conversion or how the data is retrieved in the first place. – Elonas Marcauskas Dec 31 '18 at 16:11
  • @Gaurav No it did not solve the problem – Elonas Marcauskas Dec 31 '18 at 16:49
  • https://stackoverflow.com/a/49572927/9214835 Read the answer there – Gaurav Dec 31 '18 at 17:10
  • Does it add something? – Gaurav Dec 31 '18 at 18:09
  • @Gaurav I changed the for loop that reads the message's body by adding the Markup that you import that you suggested in your comment. The output is still the same, assuming I am using the import the right way. – Elonas Marcauskas Dec 31 '18 at 23:45
  • I tested the Markup, it only replaces the |safe in the HTML file. Nothing more. – Elonas Marcauskas Jan 01 '19 at 00:07
  • @ElonasMarcauskas Then sorry i don't have a solution for that,I have never worked on IMAP applications. But would like to know the solution though – Gaurav Jan 01 '19 at 01:07
  • @ElonasMarcauskas I tried this and actually without debug=True is its correct output because this is what you are getting from the gmail – Gaurav Jan 01 '19 at 01:57

1 Answers1

1

I found the solution Actual Result

part.get_payload(decode=True).decode('utf-8')

will solve the problem

Gaurav
  • 533
  • 5
  • 20
  • Thank you! How did you figure it out? I just want to know your thought process and what you were looking for so I can use it later. – Elonas Marcauskas Jan 01 '19 at 02:28
  • 1
    Actually i was sure about it is the matter of encoding because when you were seeing escape sequence it generally happens when character encoding is not right and want to know how to check the charachter encoding. then come to this post https://stackoverflow.com/questions/4987327/how-do-i-check-if-a-string-is-unicode-or-ascii here i saw the decode method and trying different encodig – Gaurav Jan 01 '19 at 02:39
  • 1
    I see. Thank you for the explanation and solution to the problem. – Elonas Marcauskas Jan 01 '19 at 03:42
  • 1
    one more thing never post username and password in next question – Gaurav Jan 01 '19 at 05:26
  • True. I noticed a little too late. It is a test account but yeah shouldn't risk it. – Elonas Marcauskas Jan 01 '19 at 15:00
  • UTF8 is what *this* message uses. The next sender may have decided to use something else. – arnt Jan 04 '19 at 09:43
  • @arnt the one forwarded by imap or the web? – Gaurav Jan 04 '19 at 17:23
  • Mail senders decide which character encoding to use for the mail they send. If you decode mail, `decode('UTF-8')` will work as long as the sender also decided to use UTF-8, but fail if the sender encoded using latin1 or (let's get creative) hp-roman. – arnt Jan 04 '19 at 17:33
  • @arnt i totally agree but what can be the solution then, he needs to check on for all encoding I guess – Gaurav Jan 04 '19 at 18:12
  • 2
    @gaurav mail readers use the encoding that's specified the message instead of hardcoding UTF-8. This stuff is 20+ years old already, it's not a new, challenging problem. – arnt Jan 04 '19 at 20:29