problem:
I am working on an email system. We recieve emails and store them in a MySQL DB. the body is parsed, headers stripped out etc. All good with plain text emails, but when we recieve an email in MIME format, the body data is stored to the DB and looks like this:
This is a multi-part message in MIME format.
------=_NextPart_000_1B20_01CCA865.03078710
Content-Type: text/plain;
charset=\"us-ascii\"
Content-Transfer-Encoding: 7bit
This Message is intended for the indicated recipients only and may be
confidential. If this message has been sent to you in error you must take no
action based on it, nor must you copy or show it to anyone; please inform us
immediately and delete this message.
------=_NextPart_000_1B20_01CCA865.03078710
Content-Type: text/html;
charset=\"us-ascii\"
Content-Transfer-Encoding: quoted-printable
<html xmlns:v=3D\"urn:schemas-microsoft-com:vml\" =
xmlns:o=3D\"urn:schemas-microsoft-com:office:office\" =
xmlns:w=3D\"urn:schemas-microsoft-com:office:word\" =
xmlns:m=3D\"http://schemas.microsoft.com/office/2004/12/omml\" =
xmlns=3D\"http://www.w3.org/TR/REC-html40\"><head><META =
HTTP-EQUIV=3D\"Content-Type\" CONTENT=3D\"text/html; =
charset=3Dus-ascii\"><meta name=3DGenerator content=3D\"Microsoft Word 12 =
(filtered medium)\"><style><!--
/* Font Definitions */
@font-face
{font-family:\"Cambria Math\";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:\"Calibri\",\"sans-serif\";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
{mso-style-priority:99;
mso-style-link:\"Balloon Text Char\";
margin:0cm;
margin-bottom:.0001pt;
font-size:8.0pt;
font-family:\"Tahoma\",\"sans-serif\";}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:\"Calibri\",\"sans-serif\";
color:windowtext;}
span.BalloonTextChar
{mso-style-name:\"Balloon Text Char\";
mso-style-priority:99;
mso-style-link:\"Balloon Text\";
font-family:\"Tahoma\",\"sans-serif\";}
..MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D\"edit\" spidmax=3D\"1026\" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D\"edit\">
<o:idmap v:ext=3D\"edit\" data=3D\"1\" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-GB link=3Dblue =
vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><span =
style=3D\'font-size:7.5pt;font-family:\"Verdana\",\"sans-serif\";color:#1F497D=
\'>This Message is intended for the indicated recipients only and may be =
confidential. If this message has been sent to you in error you must =
take no action based on it, nor must you copy or show it to anyone; =
please inform us immediately and delete this message. </span><span =
style=3D\'color:#1F497D\'><o:p></o:p></span></p><p =
class=3DMsoNormal><o:p> </o:p></p></div></body></html>
------=_NextPart_000_1B20_01CCA865.03078710--
.
We want to strip out all but the text only version. Any Reg-Ex experts out there to solv this? We have tried several classes and other PHP systems, but they always return the same code that was originally input, not the text only that we are after. Any ideas? RegEx preffered. We are thinking along the lines of detecting text/plain and a series of line breaks to detect the plain text content....