0

I want to check all the tags under the body and check and remove if it has style attribute I have tried

$user_submitted_html = "This is Some Text";
$html = '<body>' . $user_submitted_html . '</body>';

$dom = new DOMDocument();
$dom->loadHTML($html_string);
$elements = $dom->getElementsByTagName('body');
foreach($elements as $element) {

   foreach($element->childNodes as $child) {

      if($child->hasAttribute('style')) {

          $child->removeAttribute('style')

      }      
   }  
 }

It works fine if $user_submitted_html is not only text, mean if it has some tags in it, but If it is only text then It gives the error

Call to undefined method DOMText::hasAttribute()

Then I get the nodeName in the foreach loop

echo "Node Name: " . $child->nodeName

It gives the

Node Name = #text

What kind of node name is this, I have echo'ed other nodes, it gives, div, span etc. that I am familiar with. I want to know that which are the elements that hasAttribute does not belong to them so I can put a condition before using the hasAttribute like this

if($child->nodeName=="#text") {
    continue; // skip to next iteration
}
if($child->hasAttribute('style')) {
.
.
.

OR any Other Solution???

One More Suggestion Required. What If I remove only the style attributes from <div>,<span>,<p> and <a>. Will it be safe from xss, if the rest of the tags can use style attribute.

Munib
  • 3,533
  • 9
  • 29
  • 37
  • This should help understanding the general concept of Nodes: http://stackoverflow.com/questions/4979836/noob-question-about-domdocument-in-php/4983721#4983721. An easier approach than yours would be to use [XPath](http://schlitt.info/opensource/blog/0704_xpath.html) to query the Elements children of the body element having a style attribute directly, e.g. `/html/body/*[@style]` – Gordon Mar 29 '13 at 10:19

2 Answers2

1

I think instead of checking the nodeName it would be better to check the class $child is an instance of.

if ( $child instanceof DOMElement )
{
    //do your stuff
}
n0tiz
  • 352
  • 1
  • 4
0

You can use XPath for get only the elements with style attribute

$xpath = new DOMXPath($dom);
$elements = $xpath->query('//[@style]');

foreach($elements as $e) {
    $e->removeAttribute('style')
}  
Maks3w
  • 6,014
  • 6
  • 37
  • 42