2

I am writing a simple HTML email design editor in PHP and also show a demo of how this will look.

I think it would also be very useful to show the user how this will look in an email client such as gmail with images turned off.

What is my best approach for this? Anybody know how this is done in gmail/hotmail etc?

Do I simple remove img -> src and css background: url with a reg expression?

I would like to remove the background parts from: background="url" used in tables and background-image:url(url); used inline css

I found this question which has the same kind of idea, although I would like to actually remove the img and backrgound-images from the HTML text.

Or could this code be modified to work with background images also?

Community
  • 1
  • 1
John Magnolia
  • 16,769
  • 36
  • 159
  • 270
  • You could always inject a little bit of CSS that set's all the images to display: none; in the preview. – BenOfTheNorth Apr 08 '12 at 08:52
  • 1
    @BenGriffiths, I believe that this is completely wrong, since display:none will still load the images but not display them (thus violating any security that you may had in mind implementing such a functionality) – mobius Apr 10 '12 at 12:21
  • @mobius For simply generating a preview I don't think he needs to worry about security? – BenOfTheNorth Apr 10 '12 at 12:22
  • @BenGriffiths, The whole point of not loading the images without user consent is that most images are behind a url tracking schema, that makes it possible to verify if the recipient is actually a valid email address and if he/she did actually open the email. – mobius Apr 10 '12 at 12:28
  • 1
    @mobius - I think he is previewing *before* it's being sent, ie, checking out the layout. This isn't a preview within an email client by the recipient. At least that's how I've understood what he's written. – BenOfTheNorth Apr 10 '12 at 12:32
  • @BenGriffiths, You are probably right, I misunderstood the question – mobius Apr 10 '12 at 12:40
  • @benGriffiths - Nevertheless, if he's trying to emulate gmail (or another similar platform), then the best way to do that would be to replicate their methods as best you can. For instance, most clients use a placeholder of some sort that matches the image's dimensions - something "display:none;" can't do on its own. In the event he doesn't need to **exactly** replicate the functionality, "display:none;" would work just fine :] – orourkek Apr 10 '12 at 16:24
  • @orourkek True yes, although display:hidden would keep the dimensions I think. Still, there would be other more effective ways instead of CSS I agree. – BenOfTheNorth Apr 10 '12 at 16:33
  • Related: http://stackoverflow.com/questions/9897214/how-to-strip-specific-tags-and-specific-attributes-from-a-string – Madara's Ghost Apr 14 '12 at 13:51
  • Have you done any research so far what gmail will do to turn the images off? I find it hard to give good suggestions as long as you haven't specified what the outcome should be. E.g. provide a HTML email and then provide how the HTML source looks like within gmail with images turned off. – hakre Apr 15 '12 at 13:59

7 Answers7

3

In order to fully mimic the behavior of gmail or similar web mails would be to replace the tags, and background: css attributes accordingly so that they display a placeholder, making clear to the user that here lies an image.

Since usually the message is being loaded in an iframe I believe that your best guess, would be to clean the message server side removing all unwanted tags and replacing images accordingly on preview.

I will agree with Michal that it is not wise to use just regex to validate your HTML and you probably should traverse the DOM tree just to be safe.

Why don't you take a look at washtml by Frederic Motte used by roundcube to get you started?

Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
mobius
  • 5,104
  • 2
  • 28
  • 41
  • That looks like a great starting point for what he's trying to do. Will require some modification if you want it to exactly replicate other platforms' functionalities, but overall some great content there – orourkek Apr 10 '12 at 16:28
3

I would also suggest using PHP DOM instead of regex, which are often inaccurate. Here is an example code you could use to strip all the img tags and all the background attributes from your string:

// ...loading the DOM
$dom = new DOMDocument();
@$dom->loadHTML($string);  // Using @ to hide any parse warning sometimes resulting from markup errors
$dom->preserveWhiteSpace = false;
// Here we strip all the img tags in the document
$images = $dom->getElementsByTagName('img');
$imgs = array();
foreach($images as $img) {
    $imgs[] = $img;
}
foreach($imgs as $img) {
    $img->parentNode->removeChild($img);
}
// This part strips all 'background' attribute in (all) the body tag(s)
$bodies = $dom->getElementsByTagName('body');
$bodybg = array();
foreach($bodies as $bg) {
    $bodybg[] = $bg;
}
foreach($bodybg as $bg) {
    $bg->removeAttribute('background');
}

$str = $dom->saveHTML();

I've selected the body tags instead of the table, as the <table> itself doesn't have a background attribute, it only has bgcolor. To strip the background inline css property, you can use the sabberworm's PHP CSS Parser to parse the CSS retrieved from the DOM: try this

// Selecting all the elements since each one could have a style attribute
$alltags = $dom->getElementsByTagName('*');
$tags = array();
foreach($alltags as $tag) {
    $tags[] = $tag;
} $css = array();
foreach($tags as &$tag) {
    $oParser = new CSSParser("p{".$tag->getAttribute('style')."}");
    $oCss = $oParser->parse();
    foreach($oCss->getAllRuleSets() as $oRuleSet) {
        $oRuleSet->removeRule('background');
        $oRuleSet->removeRule('background-image');
    }
    $css = $oCss->__toString();
    $css = substr_replace($css, '', 0, 3);
    $css = substr_replace($css, '', -2, 2);
    if($css)
        $tag->setAttribute('style', $css);
}

Using all this code togheter, for example if you have a

$string = '<!DOCTYPE html>
<html><body background="http://yo.ur/background/dot/com" etc="an attribute value">
<img src="http://your.pa/th/to/image"><img src="http://anoth.er/path/to/image">
<div style="background-image:url(http://inli.ne/css/background);border: 1px solid black">div content...</div>
<div style="background:url(http://inli.ne/css/background);border: 1px solid black">2nd div content...</div>
</body></html>';

The PHP will output

<!DOCTYPE html>
<html><body etc="an attribute value">
<div style="border: 1px solid black;">div content...</div>
<div style="border: 1px solid black;">2nd div content...</div>
</body></html>
Marco Gamba
  • 322
  • 2
  • 5
1

Using regular expressions to parse html is usually not recommended.

I think a better approach would be to parse the html server-side, and manipulate it to remove the images or the image src attributes. A library I've had success with is http://simplehtmldom.sourceforge.net/, but I think you can use official PHP DOM extensions.

The removal of background images might be more tricky. You might have to use something like http://www.pelagodesign.com/sidecar/emogrifier/ to apply something like {background: none} to the html elements. However, CSS background images are not supported in the latest versions of Microsoft Outlook, so I would recommend not using them at all from the get-go in order to have the emails to be consistent for most email clients.

Michal Charemza
  • 25,940
  • 14
  • 98
  • 165
  • Ah, but @Michael Charemza, you have left out the ever-important link to [the reason not to use regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) .... – Tetsujin no Oni Apr 13 '12 at 18:46
1

Like tkone mentioned: perhaps JavaScript / jQuery is the answer.

This will look at all images in your preview area and change the source to a placeholder image. The 'placeholder' class sets the background image to the placeholder as well

jQuery

$("#previewArea img").each(function(){
  $(this).attr("src","placeholder.jpg");
  $(this).addClass("hideBG");
});

CSS

.hideBG{
  background: url("placeholder.jpg");
}

Not tested, but should work - depending on your setup and needs.

DACrosby
  • 11,116
  • 3
  • 39
  • 51
  • 1
    It won't work as it would only run after the page had finish loading. Rendering your code useless. – Madara's Ghost Apr 14 '12 at 13:50
  • If it did run on `$(window).load`, it would technically still work but users would have to see and download all of the content before it changed to the preview mode. As this is for a email design app, shouldn't really be an issue. If it is an issue, it could be run on `$(document).ready` which will make all of the changes as soon as the DOM loads regardless of the images' load status (that is, prior to them showing up on the page). – DACrosby Apr 15 '12 at 23:21
1

I've asked a similar question (in solution, not actual problem): How to strip specific tags and specific attributes from a string? (Solution)

It's a server side library which cleans (and formats) HTML input according to predefined settings. Have it remove any src attributes and all background properties.

Community
  • 1
  • 1
Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
0

You could always do this on the client end as well.

Using this hypothetical code, you should be able to do something like this, pretending that modern browsers all work the same: (or use jQuery or something)

var email;
var xhr = new XMLHttpRequest();
xhr.open('GET', URL_FOR_EMAIL, true);
xhr.onreadystatechange = function(event){
   if(xhr.readyState === 4 && xhr.status === 200){
        email = HTMLParser(xhr.responseText);
   }
}

var imgs = email.getElementsByTagName('img');
for(var i = 0; i > imgs.length; i++){
    email.removeChild(imgs[i]);
}

// attach the email body to the DOM
// do something with the images

HTMLParser from MDN

function HTMLParser(aHTMLString){
  var html = document.implementation.createDocument("http://www.w3.org/1999/xhtml", "html", null),
    body = document.createElementNS("http://www.w3.org/1999/xhtml", "body");
  html.documentElement.appendChild(body);

  body.appendChild(Components.classes["@mozilla.org/feed-unescapehtml;1"]
    .getService(Components.interfaces.nsIScriptableUnescapeHTML)
    .parseFragment(aHTMLString, false, null, body));

  return body;
},
tkone
  • 22,092
  • 5
  • 54
  • 78
0

I think that the best way to do it and keep the change reversible its using a tag who not process the "src" attribute.

Ex: Change all the "img" with "br"

So print the filtered HTML 1st and reverse it with ajax, search for all the br with a src attribute.

Quaid
  • 333
  • 1
  • 8