5

I am writing an email parser in Python and looking for a way to extract all previous emails (forwarded, replied) from an email body. The script has to support as many email clients as possible (gmail, outlook, iphone, etc.). For example if the body is:

example email text

On Jul 31, 2013, at 5:15 PM, John Doe <jdoe@gmail.com> wrote:

> example email text
>
>
> *From:* Me [mailto:me@gmail.com]
> *Sent:* Thursday, May 31, 2012 3:54 PM
> *To:* John Doe
> *Subject:* RE: subject
>
> example email text

The result should be an array with 3 entries, each entry contains the email text and as many metadata as possible (date, sender, subject, etc.).

Are there any standard / modern ways of achieving this? Is there a maintained list of responses from different clients? I've searched for similar questions but no satisfying answer so far.

Tzach
  • 12,889
  • 11
  • 68
  • 115
  • 1
    Try regular expression, to identify patterns in the mails. If you are flexible use AWK instead. – Vivek Sep 12 '13 at 15:09
  • Thanks, but the real problem is building these regex / AWK code. I'm looking for an existing code or algorithm. – Tzach Sep 12 '13 at 15:23

1 Answers1

0

I found this which might be useful.

https://github.com/zapier/email-reply-parser

Edward van Kuik
  • 1,357
  • 1
  • 9
  • 9
  • Thank you. I've already checked this library before. It has very limited and simple functionality, not covering most of the real world cases. – Tzach Feb 19 '14 at 10:25