Finding hyperlinks in gmail email body with IMAP

Question

The point of this script is to find hyperlinks in emails and automatically open them. I'm currently stuck on the search part.

The script can't seem to pick up the link from the body of the email. The hyperlink should look like

https://something.com/verify/c4b7668ad547922226426896f

is something wrong with my regex?

def process_mailbox(M):
    rv, data = M.search(None, specific_email_addy)
    if rv != 'OK':
        print "No messages found!"
        return

    for num in data[0].split():
        rv, data = M.fetch(num, '(RFC822)')
        if rv != 'OK':
            print "ERROR getting message", num
            return

        msg = email.message_from_string(data[0][1])

        raw_email = data[0][1] # here's the body, which is raw headers and html and body of the whole email including headers and alternate payloads
        msg = email.message_from_string(raw_email)

        for part in msg.walk():
            # each part is a either non-multipart, or another multipart message
            # that contains further parts... Message is organized like a tree
            if part.get_content_type() == 'text/html':
                plain_text = part.get_payload()

                link_pattern = re.compile('<a[^>]+href=\'(.*?)\'[^>]*>(.*)?</a>')
                search = link_pattern.search(plain_text)
                if search is not None:
                    print("Link found! -> " + search)
                    break
                else:
                    print("No links were found.")

Hyperlinks or URLs? They are different things and your example (`https://something.com...`) is a URL whereas your regex is searching for an `` tag. — Selcuk, Nov 08 '19 at 00:33
Does this answer your question? [regular expression for finding 'href' value of a link](https://stackoverflow.com/questions/15926142/regular-expression-for-finding-href-value-of-a-a-link) — Kent Shikama, Nov 08 '19 at 00:38
@Selcuk the url is wrapped in the tag, so I'm searching for the url inside the hyperlink. I'm ok with getting the hyperlink then parsing it myself for the url — chadlei, Nov 08 '19 at 05:37
@KentShikama kind of but I'm trying to figure out a way to make it work in python — chadlei, Nov 08 '19 at 05:42
You should then post an example of the tag you want to match. — Selcuk, Nov 08 '19 at 05:53
@Selcuk this is the exact tag I'm trying to get I'm having trouble getting why link_pattern isnt picking up this — chadlei, Nov 08 '19 at 05:56
so with BeautifulSoup I'm able to pick up u'3D"https://example.heyyo.=' but its not picking up the rest of the url and I'm guessing its because the rest of it is on a new line — chadlei, Nov 08 '19 at 06:13
\n this is what i get when i run soup = bs4.BeautifulSoup(plain_text, features="html.parser") aTags = soup.find_all("a",href=True) — chadlei, Nov 08 '19 at 06:39

Finding hyperlinks in gmail email body with IMAP

0 Answers0