I would also suggest using PHP DOM instead of regex, which are often inaccurate. Here is an example code you could use to strip all the img tags and all the background attributes from your string:
// ...loading the DOM
$dom = new DOMDocument();
@$dom->loadHTML($string); // Using @ to hide any parse warning sometimes resulting from markup errors
$dom->preserveWhiteSpace = false;
// Here we strip all the img tags in the document
$images = $dom->getElementsByTagName('img');
$imgs = array();
foreach($images as $img) {
$imgs[] = $img;
}
foreach($imgs as $img) {
$img->parentNode->removeChild($img);
}
// This part strips all 'background' attribute in (all) the body tag(s)
$bodies = $dom->getElementsByTagName('body');
$bodybg = array();
foreach($bodies as $bg) {
$bodybg[] = $bg;
}
foreach($bodybg as $bg) {
$bg->removeAttribute('background');
}
$str = $dom->saveHTML();
I've selected the body tags instead of the table, as the <table>
itself doesn't have a background
attribute, it only has bgcolor
.
To strip the background inline css property, you can use the sabberworm's PHP CSS Parser
to parse the CSS retrieved from the DOM: try this
// Selecting all the elements since each one could have a style attribute
$alltags = $dom->getElementsByTagName('*');
$tags = array();
foreach($alltags as $tag) {
$tags[] = $tag;
} $css = array();
foreach($tags as &$tag) {
$oParser = new CSSParser("p{".$tag->getAttribute('style')."}");
$oCss = $oParser->parse();
foreach($oCss->getAllRuleSets() as $oRuleSet) {
$oRuleSet->removeRule('background');
$oRuleSet->removeRule('background-image');
}
$css = $oCss->__toString();
$css = substr_replace($css, '', 0, 3);
$css = substr_replace($css, '', -2, 2);
if($css)
$tag->setAttribute('style', $css);
}
Using all this code togheter, for example if you have a
$string = '<!DOCTYPE html>
<html><body background="http://yo.ur/background/dot/com" etc="an attribute value">
<img src="http://your.pa/th/to/image"><img src="http://anoth.er/path/to/image">
<div style="background-image:url(http://inli.ne/css/background);border: 1px solid black">div content...</div>
<div style="background:url(http://inli.ne/css/background);border: 1px solid black">2nd div content...</div>
</body></html>';
The PHP will output
<!DOCTYPE html>
<html><body etc="an attribute value">
<div style="border: 1px solid black;">div content...</div>
<div style="border: 1px solid black;">2nd div content...</div>
</body></html>