1

see this code below:

comes from: http://www.damnsemicolon.com/php/php-parse-email-body-email-piping

//get rid of any quoted text in the email body
$body_array = explode("\n",$body);
$message = "";
foreach($body_array as $key => $value){

    //remove hotmail sig
    if($value == "_________________________________________________________________"){
        break;

    //original message quote
    } elseif(preg_match("/^-*(.*)Original Message(.*)-*/i",$value,$matches)){
        break;

    //check for date wrote string
    } elseif(preg_match("/^On(.*)wrote:(.*)/i",$value,$matches)) {
        break;

    //check for From Name email section
    } elseif(preg_match("/^On(.*)$fromName(.*)/i",$value,$matches)) {
        break;

    //check for To Name email section
    } elseif(preg_match("/^On(.*)$toName(.*)/i",$value,$matches)) {
        break;

    //check for To Email email section
    } elseif(preg_match("/^(.*)$toEmail(.*)wrote:(.*)/i",$value,$matches)) {
        break;

    //check for From Email email section
    } elseif(preg_match("/^(.*)$fromEmail(.*)wrote:(.*)/i",$value,$matches)) {
        break;

    //check for quoted ">" section
    } elseif(preg_match("/^>(.*)/i",$value,$matches)){
        break;

    //check for date wrote string with dashes
    } elseif(preg_match("/^---(.*)On(.*)wrote:(.*)/i",$value,$matches)){
        break;

    //add line to body
    } else {
        $message .= "$value\n";
    }

}

//compare before and after
echo "$body<br><br><br>$message";

$body contains the complete email body including quoted area if this is a reply, this loop removes quoted area to get new reply as $message. But as suggested there, loop is slow and better to use preg_replace instead. so how can I do?

replace patterns with what? should I remove foreach loop too? I created below without foreach loop but seems wrong? please advice.

$patterns = array(
"_________________________________________________________________",
"/^-*(.*)Original Message(.*)-*/i",
"/^On(.*)wrote:(.*)/i",
"/^On(.*)$fromName(.*)/i",
"/^On(.*)$toName(.*)/i",
"/^(.*)$toEmail(.*)wrote:(.*)/i",
"/^(.*)$fromEmail(.*)wrote:(.*)/i",
"/^>(.*)/i",
"/^---(.*)On(.*)wrote:(.*)/i");

$message = preg_replace($patterns, '', $body);
n099y
  • 414
  • 2
  • 16
user4271704
  • 723
  • 1
  • 12
  • 37
  • 1
    You'd at least need the `/m` multiline modifier to make them work outside of a line-wise foreach. Otherwise explain with concrete samples and error messages how it's not working. – mario Jul 06 '15 at 16:20
  • could you please just re-write the code block on that link with preg_replace instead of preg_match that I can understand it better? – user4271704 Jul 07 '15 at 07:35
  • Add the code to your question next time, do not link it. – n099y Jul 09 '15 at 11:56
  • Hmm, such useful answers I always get on stackoverflow! – user4271704 Jul 10 '15 at 06:04
  • Is anyone on stackoverflow to give usefull advice instead of meaningless/useless spams/comments above? – user4271704 Jul 21 '15 at 12:56

1 Answers1

1

You already narrowed it down to a workable solution. Only a few things to fix:

  1. As @mario commented, you need to set the /m modifier for ^s to match at the beggining of each line.
  2. Your first pattern needs to be enclosed with delimiters, and anchored to ^ and to the end of line to mantain the same meaning as in the original code.
  3. Include the newline chars in order to remove the whole line.
  4. Make sure the variables $fromName, $fromEmail, etc. are set.
  5. Once you get a match, match everything from there to the end of the body with (?s:.*).

Code:

$patterns = array(
    "/^_{30,}$(?s:.*)/m",
    "/^.*Original Message(?s:.*)/im",
    "/^(?:---.*)?On .* wrote:(?s:.*)/im",
    "/^On .* $fromName(?s:.*)/im",
    "/^On .* $toName(?s:.*)/im",
    "/^.*$toEmail(.*)wrote:(?s:.*)/im",
    "/^.*$fromEmail.* wrote:(?s:.*)/im",
    "/^>.*/ims",
);
$message = preg_replace($patterns, '', $body);
echo "$body<br><br><br>$message";

Run this code here


A word of advice:

Take into account that it will also strip lines like:

only thing I wrote: ...
Mariano
  • 6,423
  • 4
  • 31
  • 47
  • Thanks, so what should I do to prevent the problem you mentioned? How to have it that it removes only quoted area of email? – user4271704 Nov 01 '15 at 14:31
  • And not necessarily fromName and fromEmail will be present in email reply body, so how to prevent the problem you mentioned in #4? – user4271704 Nov 01 '15 at 14:41
  • There will always be exceptions here, as there isn't a perfect solution. Test it with different e-mails, remove `/i` if you can, and keep searching online for more info. Here's an article you may find interesting: [Parse email content from quoted reply](http://stackoverflow.com/a/279417/5290909). – Mariano Nov 02 '15 at 02:45