5
    #!/usr/bin/php -q
    <?php
    $savefile = "savehere.txt";
    $sf = fopen($savefile, 'a') or die("can't open file");
    ob_start();

    // read from stdin
    $fd = fopen("php://stdin", "r");
    $email = "";
    while (!feof($fd)) {
        $email .= fread($fd, 1024);
    }
    fclose($fd);
    // handle email
    $lines = explode("\n", $email);

    // empty vars
    $from = "";
    $subject = "";
    $headers = "";
    $message = "";
    $splittingheaders = true;

    for ($i=0; $i < count($lines); $i++) {
        if ($splittingheaders) {
            // this is a header
            $headers .= $lines[$i]."\n";

            // look out for special headers
            if (preg_match("/^Subject: (.*)/", $lines[$i], $matches)) {
                $subject = $matches[1];
            }
            if (preg_match("/^From: (.*)/", $lines[$i], $matches)) {
                $from = $matches[1];
            }
            if (preg_match("/^To: (.*)/", $lines[$i], $matches)) {
                $to = $matches[1];
            }
        } else {
            // not a header, but message
            $message .= $lines[$i]."\n";




        }

        if (trim($lines[$i])=="") {
            // empty line, header section has ended
            $splittingheaders = false;
        }
    }
/*$headers is ONLY included in the result at the last section of my question here*/
    fwrite($sf,"$message");
    ob_end_clean();
    fclose($sf);
    ?>

That is an example of my attempt. The problem is I am getting too much in the file. Here is what is being written to the file: (I just sent a bunch of garbage to it as you can see)

From xxxxxxxxxxxxx Tue Sep 07 16:26:51 2010
Received: from xxxxxxxxxxxxxxx ([xxxxxxxxxxx]:3184 helo=xxxxxxxxxxx)
    by xxxxxxxxxxxxx with esmtpa (Exim 4.69)
    (envelope-from <xxxxxxxxxxxxxxxx>)
    id 1Ot4kj-000115-SP
    for xxxxxxxxxxxxxxxxxxx; Tue, 07 Sep 2010 16:26:50 -0400
Message-ID: <EE3B7E26298140BE8700D9AE77CB339D@xxxxxxxxxxx>
From: "xxxxxxxxxxxxx" <xxxxxxxxxxxxxx>
To: <xxxxxxxxxxxxxxxxxxxxx>
Subject: stackoverflow is helping me
Date: Tue, 7 Sep 2010 16:26:46 -0400
MIME-Version: 1.0
Content-Type: multipart/alternative;
    boundary="----=_NextPart_000_0169_01CB4EA9.773DF5E0"
X-Priority: 3
X-MSMail-Priority: Normal
Importance: Normal
X-Mailer: Microsoft Windows Live Mail 14.0.8089.726
X-MIMEOLE: Produced By Microsoft MimeOLE V14.0.8089.726

This is a multi-part message in MIME format.

------=_NextPart_000_0169_01CB4EA9.773DF5E0
Content-Type: text/plain;
    charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

111
222
333
444
------=_NextPart_000_0169_01CB4EA9.773DF5E0
Content-Type: text/html;
    charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3Dtext/html;charset=3Diso-8859-1 =
http-equiv=3DContent-Type>
<META name=3DGENERATOR content=3D"MSHTML 8.00.6001.18939"></HEAD>
<BODY style=3D"PADDING-LEFT: 10px; PADDING-RIGHT: 10px; PADDING-TOP: =
15px"=20
id=3DMailContainerBody leftMargin=3D0 topMargin=3D0 =
CanvasTabStop=3D"true"=20
name=3D"Compose message area">
<DIV><FONT face=3DCalibri>111</FONT></DIV>
<DIV><FONT face=3DCalibri>222</FONT></DIV>
<DIV><FONT face=3DCalibri>333</FONT></DIV>
<DIV><FONT face=3DCalibri>444</FONT></DIV></BODY></HTML>

------=_NextPart_000_0169_01CB4EA9.773DF5E0--

I found this while searching around but have no idea how to implement or where to insert in my code or if it would work.

preg_match("/boundary=\".*?\"/i", $headers, $boundary);
$boundaryfulltext = $boundary[0];

if ($boundaryfulltext!="")
{
$find = array("/boundary=\"/i", "/\"/i");
$boundarytext = preg_replace($find, "", $boundaryfulltext);
$splitmessage = explode("--" . $boundarytext, $message);
$fullmessage = ltrim($splitmessage[1]);
preg_match('/\n\n(.*)/is', $fullmessage, $splitmore);

if (substr(ltrim($splitmore[0]), 0, 2)=="--")
{
$actualmessage = $splitmore[0];
}
else
{
$actualmessage = ltrim($splitmore[0]);
}

}
else
{
$actualmessage = ltrim($message);
}

$clean = array("/\n--.*/is", "/=3D\n.*/s");
$cleanmessage = trim(preg_replace($clean, "", $actualmessage)); 

So, how can I get just the plain text area of the email into my file or script for furthr handling??

Thanks in advance. stackoverflow is great!

Daniel Vandersluis
  • 91,582
  • 23
  • 169
  • 153
Jimbo
  • 51
  • 1
  • 1
  • 3
  • Is that the full email? It's missing the `Content-Type: multipart/mixed` header, which should specify what the boundary string is (which the code you found needs). – Daniel Vandersluis Sep 07 '10 at 20:15
  • That is just the part of the email that is saved to the file. That is as stripped down as I could get it using the first code example. – Jimbo Sep 07 '10 at 20:18
  • The boundary header is important to be able to parse your email as it specifies where each *part* of the email begins and ends. Without it, all you can do is guess, and you know what they say about assuming... ;) For instance, for your quoted email, there should be a header like: `Content-Type: multipart/mixed; boundary="----=_NextPart_000_0163_01CB4EA5.46466520"` – Daniel Vandersluis Sep 07 '10 at 20:19
  • Would the boundaries be the same coming from different pc based email clients or the popular free email accounts? – Jimbo Sep 07 '10 at 20:24
  • I added the headers var to the file write and edited my question to add that info for you guys/gals... – Jimbo Sep 07 '10 at 20:32
  • @Jimbo there you go, now the boundary is there (see the 13th/14th line in your header). The boundary is often different *per email* it's just a way for the email client to know how to separate a multipart email into it's various parts (otherwise all the parts would be displayed!) – Daniel Vandersluis Sep 07 '10 at 20:35
  • Ok, so how do I do this? So I can get just the text of the email no matter where I send it from, who sent it or whatever the boundary is located?? Am I just too stupid to understand what you are telling me? (please don't answer that last part ;-) – Jimbo Sep 07 '10 at 20:45

2 Answers2

18

There are four steps that you will have to take in order to isolate the plain text part of your email body:

1. Get the MIME boundary string

We can use a regular expression to search your headers (let's assume they're in a separate variable, $headers):

$matches = array();
preg_match('#Content-Type: multipart\/[^;]+;\s*boundary="([^"]+)"#i', $headers, $matches);
list(, $boundary) = $matches;

The regular expression will search for the Content-Type header that contains the boundary string, and then capture it into the first capture group. We then copy that capture group into variable $boundary.

2. Split the email body into segments

Once we have the boundary, we can split the body into its various parts (in your message body, the body will be prefaced by -- each time it appears). According to the MIME spec, everything before the first boundary should be ignored.

$email_segments = explode('--' . $boundary, $message);
array_shift($email_segments); // drop everything before the first boundary

This will leave us with an array containing all the segments, with everything before the first boundary ignored.

3. Determine which segment is plain text.

The segment that is plain text will have a Content-Type header with the MIME-type text/plain. We can now search each segment for the first segment with that header:

foreach ($email_segments as $segment)
{
  if (stristr($segment, "Content-Type: text/plain") !== false)
  {
    // We found the segment we're looking for!
  }
}

Since what we're looking for is a constant, we can use stristr (which finds the first instance of a substring in a string, case insensitively) instead of a regular expression. If the Content-Type header is found, we've got our segment.

4. Remove any headers from the segment

Now we need to remove any headers from the segment we found, as we only want the actual message content. There are four MIME headers that can appear here: Content-Type as we saw before, Content-ID, Content-Disposition and Content-Transfer-Encoding. Headers are terminated by \r\n so we can use that to determine the end of the headers:

$text = preg_replace('/Content-(Type|ID|Disposition|Transfer-Encoding):.*?\r\n/is', "", $segment);

The s modifier at the end of the regular expression makes the dot match any newlines. .*? will collect as few characters as possible (ie. everything up to \r\n); the ? is a lazy modifier on .*.

And after this point, $text will contain your email message content.

So to put it all together with your code:

<?php
// read from stdin
$fd = fopen("php://stdin", "r");
$email = "";
while (!feof($fd))
{
    $email .= fread($fd, 1024);
}
fclose($fd);

$matches = array();
preg_match('#Content-Type: multipart\/[^;]+;\s*boundary="([^"]+)"#i', $email, $matches);
list(, $boundary) = $matches;

$text = "";
if (isset($boundary) && !empty($boundary)) // did we find a boundary?
{
  $email_segments = explode('--' . $boundary, $email);

  foreach ($email_segments as $segment)
  {
    if (stristr($segment, "Content-Type: text/plain") !== false)
    {
      $text = trim(preg_replace('/Content-(Type|ID|Disposition|Transfer-Encoding):.*?\r\n/is', "", $segment));
      break;
    }
  }
}

// At this point, $text will either contain your plain text body,
// or be an empty string if a plain text body couldn't be found.

$savefile = "savehere.txt";
$sf = fopen($savefile, 'a') or die("can't open file");
fwrite($sf, $text);
fclose($sf);
?>
Daniel Vandersluis
  • 91,582
  • 23
  • 169
  • 153
  • I am beginning to understand, I think.. So, to test would I replace everything after //empty vars??? – Jimbo Sep 07 '10 at 21:23
  • Not exactly. It depends on what you want to do (for instance you might want to continue splitting headers or collecting the "special" headers). My code expects that you'll have one block of text for headers and one for the message, but you could just replace `$headers` and `$message` in my code with `$email` which as per your code should contain the whole email. – Daniel Vandersluis Sep 07 '10 at 21:31
  • AAAH, I don't understand! How can I implement this in my code example above so, I can test it? Would I put your snippet before file the file write? Then write $text instead of $message? I really appreciate your help AND PATIENCE with this beginner here. – Jimbo Sep 07 '10 at 21:43
  • I updated my code to read in the email (as per your code) and process it. My code snippet should work the way you want without having to make any modifications. If you want to do anything else with the email, I'll leave that to you (or you can ask another question here for further help). – Daniel Vandersluis Sep 07 '10 at 21:48
  • YOU ROCK, I will try after I feed the family here.. Then I will go through line by line in attempt to learn from your obvious expertise. (The least I can do is send you some ego cookies!) – Jimbo Sep 07 '10 at 22:35
  • 1
    Old post, but I thought I'd add a quick update from a bug I found. In step 3, I found that the regex would not match the multipart headers because they don't always have a carriage return after them. If you remove the '\r' in that preg, I believe it works for all cases (because if there is one, it will be caught by the '.*?'). So the new one looks like $text = trim(preg_replace('/Content-(Type|ID|Disposition|Transfer-Encoding):.*?\n/is', "", $segment)); – Jordan Apr 16 '16 at 15:50
  • @Jordan I can confirm this assisted me with emails originating from Outlook via Office365 Exchange. – Tarquin Aug 27 '20 at 02:15
0

There is one answer here:

You need only to change these 2 lines:

require_once('/path/to/class/rfc822_addresses.php');
require_once('/path/to/class/mime_parser.php');
Lance Roberts
  • 22,383
  • 32
  • 112
  • 130
Mladen
  • 11
  • 4