Python, parsing links from poplib email message

Question

I'm struggling to extract links from email messages downloaded from my pop3 server

I'm trying to parse the body of an email message which matches a passed in subject line. I'm using poplib module to get the emails from the server with the code below:

def pop3_return_all_messages(user="", password="", pop_address="", subject=None):
    pop_conn = poplib.POP3_SSL(pop_address)
    pop_conn.user(user)
    pop_conn.pass_(password)
    #Get messages from server:
    for m in pop_conn.list()[1]:
        idx = int(m.split()[0])
        # we use top(), not retr(), so that the 'read'-flag isn't set
        head, body = parse_email(pop_conn.top(idx, 8192)[1]) #build dict of values for head and body
        if subject in head["subject"]: #if subject matches
            get_link(body)

The parsing of the email is done using parse_email, which generated a dictionary for the head and body of the email:

def parse_email(lines):
    '''Splits an email provided as a list of lines into a dictionary of
    header fields and a list of body lines. Returns a (header, body)
    tuple.
    '''
    part = 0
    head = {}
    body = []
    for ln in lines:
        if part == 0:
            if not ln.strip():
                part = 1
                continue
            i = ln.find(':')
            if i <= 0:
                continue
            head[ln[:i].strip().lower()] = ln[i+1:].strip()
        else:
            body.append(ln)
    return head, body

Here is get links, which attempts to build up a list of links.

def get_link(body):
    def get_line(body):
        for item in body:
            yield item
    links = [] #empty list for links
    multipart_link = [] #empty list for multiline links
    for line in get_line(body):
        if "http" in line: #If a link has been found
            if ">" in line: #If that link ends on the same line (single line link)
                links.append(line) #add to links list
            else: #multiline link detected
                multipart_link.append(line) #add current line
                for item in xrange(1,10):
                    if ">" not in get_line(body):
                        multipart_link.append(line) #
                    else:
                        multipart_link.append(line)
                        print multipart_link
                        break #last part of multipart link, exit
                multi_link = "".join(multipart_link) #join up multipart link
                links.append(multi_link) #add to links
                multipart_link.pop() #clear multipart links
    return links

Everything under

else: #multiline link detected

I can't get to work. Essentially I want to detect for multiline links, which run over more than one dictionary value, each line that needs to be added, upto when the link ends. This will be when a > is detected.

I've hit a brick wall here. I can get single line links fine, but multipart ones I'm struggling with and would appreciate some help. Obviously I still need to clean up the generated links, but I can write a regex for that later.

a function inside a function! funky! maybe it can be done in another way?http://stackoverflow.com/questions/4831680/function-inside-function — Ulf Gjerdingen, Jun 06 '16 at 17:46
thanks I had it outside as a separate function, I added it inside for this post, to reduce code comments :-) Thanks for the insight though, but just so no one assumes this solves it, this doesn't provide any help into my actual problem. Many thanks. — Thomas Wilson, Jun 06 '16 at 19:20

Python, parsing links from poplib email message

0 Answers0