6

I am piping an email to a program and running some code.

**

I know how to get the "From:" and the "Subject:" but how do I get only the body of the email?

**

#!/usr/bin/php -q
<?

$fd = fopen("php://stdin", "r");
while (!feof($fd)) {
  $email .= fread($fd, 1024);
}
fclose($fd);

$lines = explode("\n", $email);

for ($i=0; $i < count($lines); $i++) 
{


    // look out for special headers
    if (preg_match("/Subject:/", $lines[$i], $matches)) 
        {

    list($One,$Subject) = explode("Subject:", $lines[$i]);    
    list($Subject,$Gone) = explode("<", $Subject);  


        }

etc... HOW DO I GET THE BODY CONTENT OF THE EMAIL?

vizenor
  • 63
  • 1
  • 3

1 Answers1

6

Basically, you want where the headers end, and to know if it's multipart or not so you can get the right portion(s) of the email.

Here is some information:

parsing raw email in php

Which says that the first double newline should be the beginning of the body of the email.

This page might give you some other ideas (see script below):

http://thedrupalblog.com/configuring-server-parse-email-php-script

#!/usr/bin/php
<?php

// fetch data from stdin
$data = file_get_contents("php://stdin");

// extract the body
// NOTE: a properly formatted email's first empty line defines the separation between the headers and the message body
list($data, $body) = explode("\n\n", $data, 2);

// explode on new line
$data = explode("\n", $data);

// define a variable map of known headers
$patterns = array(
  'Return-Path',
  'X-Original-To',
  'Delivered-To',
  'Received',
  'To',
  'Message-Id',
  'Date',
  'From',
  'Subject',
);

// define a variable to hold parsed headers
$headers = array();

// loop through data
foreach ($data as $data_line) {

  // for each line, assume a match does not exist yet
  $pattern_match_exists = false;

  // check for lines that start with white space
  // NOTE: if a line starts with a white space, it signifies a continuation of the previous header
  if ((substr($data_line,0,1)==' ' || substr($data_line,0,1)=="\t") && $last_match) {

    // append to last header
    $headers[$last_match][] = $data_line;
    continue;

  }

  // loop through patterns
  foreach ($patterns as $key => $pattern) {

    // create preg regex
    $preg_pattern = '/^' . $pattern .': (.*)$/';

    // execute preg
    preg_match($preg_pattern, $data_line, $matches);

    // check if preg matches exist
    if (count($matches)) {

      $headers[$pattern][] = $matches[1];
      $pattern_match_exists = true;
      $last_match = $pattern;

    }

  }

  // check if a pattern did not match for this line
  if (!$pattern_match_exists) {
    $headers['UNMATCHED'][] = $data_line;
  }

}

?>

EDIT

Here is a PHP extension called MailParse:

http://pecl.php.net/package/mailparse

Somebody has built a class around it called MimeMailParse:

http://code.google.com/p/php-mime-mail-parser/

And here is a blog entry discussing how to use it:

http://www.bucabay.com/web-development/a-php-mime-mail-parser-using-mailparse-extension/

Community
  • 1
  • 1
Jared Farrish
  • 48,585
  • 17
  • 95
  • 104
  • Wow... you are so fast. I am not sure I understand this yet. What do I Var do I echo out to get access to the Body then? :) – vizenor Apr 20 '11 at 02:24
  • I noticed it works for html emails but if you send from Gmail you get this: – vizenor Apr 20 '11 at 02:32
  • --90e6ba1efd480d5bc804a150689a Content-Type: text/plain; charset=ISO-8859-1 MY CONTENT I SENT --90e6ba1efd480d5bc804a150689a Content-Type: text/html; charset=ISO-8859-1 MY CONTENT I SENT --90e6ba1efd480d5bc804a150689a-- – vizenor Apr 20 '11 at 02:33
  • Those are content-type boundaries. Essentially, it's sending two different versions of the email, text and html (for different email clients that may not support html), and uses the `--90e6ba1efd480d5bc804a150689a` to 'demarcate' or mark a 'boundary' where one message block starts and stops. This is also how attachments are included (as text) within the message packets. – Jared Farrish Apr 20 '11 at 02:38
  • You should also see in Gmail's See Original the line: `Content-Type: multipart/alternative; boundary="----=90e6ba1efd480d5bc804a150689a"`, with the part between the parentheses a randomly-generated string meant to be one-use (for the most part). – Jared Farrish Apr 20 '11 at 02:39
  • Sorry I am still working on this problem. Is there a php code to get both types of content? The html and the text? I have worked for hours on this. – vizenor Apr 20 '11 at 18:24
  • @vizenor - Give this a shot: http://www.bucabay.com/web-development/a-php-mime-mail-parser-using-mailparse-extension/ – Jared Farrish Apr 20 '11 at 21:41