3

I'm currently need to know how to extract a link from html mail in Gmail account.

I able to connect using the python library for gmail provided by google but once i use the fonction describe as example:

def GetMimeMessage(service, user_id, msg_id):
  """Get a Message and use it to create a MIME Message.

  Args:
    service: Authorized Gmail API service instance.
    user_id: User's email address. The special value "me"
    can be used to indicate the authenticated user.
    msg_id: The ID of the Message required.

  Returns:
    A MIME Message, consisting of data from Message.
  """
  try:
    message = service.users().messages().get(userId=user_id, id=msg_id,
                                             format='raw').execute()

    print ('Message snippet: %s' % message['snippet'])

    msg_str = base64.urlsafe_b64decode(message['raw'].encode('ASCII'))

    mime_msg = email.message_from_string(msg_str)

    return mime_msg
  except errors.HttpError, error:
    print ('An error occurred: %s') % error

message = GetMimeMessage(service, 'me', '15876d11f0719f43')
#message = GetMessage(service, 'me', '15876d11f0719f43')
#message = str(message)
print (message)

The message looks like this:

    style=3D"font-family: Verdana, Geneva, sans-serif; font-siz=
e: 14px; line-height: 20px;"><a href=3D"https://e.vidaxl.com/1/4/1505/1/ZJb=
bJSDLWNxmfpHYIRQzlMiIupCb0wiKMrAxIrfXlymZQ_TK5GUcGAT6rIBJD9nfIFJ5XWG6HnYei-=
G1aQqlfnBxKnJ3yujKlOpRY2UxqroSHS51ofyXzr3kFa7OTyJH5zKbxESXzbTlcQOYxRuEnBcKF=
saVBGQXyJomUGLL6RY" target=3D"_blank" style=3D"color:blue;">SUIVEZ VOTRE CO=
MMANDE</a></td>=0A

As you see at each end it add a "=" sign . I don't know why, is it because i print it ?

My main question is how to extract a specific link from a MIMEmail. I tried lxml withoug success. first because i have those "=" added i think or because it's not a valid html or xml.

Thanks for your help

Andronaute
  • 379
  • 3
  • 12
  • the example at th Google API page, https://developers.google.com/gmail/api/v1/reference/users/messages/get, doesn't use the "raw" attribute, have you tried the default ("full") – transient_loop Nov 22 '16 at 15:31
  • Hi, they are using raw no ? I tried also with full but i got an error saying there are no key. – Andronaute Nov 22 '16 at 15:39
  • 1
    Try using [httplib — HTTP protocol client](https://docs.python.org/2/library/httplib.html) library from Python which is meant for parsing HTTP response which is demoed in this [SO thread](http://stackoverflow.com/questions/24728088/python-parse-http-response-string). Try using msg.get_payload() which returns the current payload. Demo code is [here](http://stackoverflow.com/questions/24952039/python-google-api-getting-mimetypes-from-a-message). – ReyAnthonyRenacia Nov 23 '16 at 12:02
  • I used you response to find a solution which can solve my issue. thanks again. I will edit my question to post update code – Andronaute Nov 25 '16 at 16:15

0 Answers0