0
$string = '<p><a href="http://example.com">Link</a></p>'; // via $_POST['post-content']
$dom = new DOMDocument();
$dom->loadHTML($string);
$allowed_attributes = array('id','href', 'src', 'class', 'style', 'colspan', 'rowspan');
foreach($dom->getElementsByTagName('*') as $node){
    for($i = $node->attributes->length -1; $i >= 0; $i--){
        $attribute = $node->attributes->item($i);
        if(!in_array($attribute->name,$allowed_attributes)) $node->removeAttributeNode($attribute);
    }
}

$html = $dom->saveHTML();

Result...

<p><a href="%5C%22http://example.com%5C%22">Link</a></p>

...

I tried html_entity_decode($html), but it doesn't work. I don't understand what is causing this problem. I could use some help.

1 Answers1

0

I ran into this issue and question while working on a wordpress filter. I found in my case that the content was run through addslashes and the slashes were causing that return. The question above would look something like this.

$string = stripslashes('<p><a href="http://example.com">Link</a></p>'); // via $_POST['post-content']
$dom = new DOMDocument();
$dom->loadHTML($string);
$allowed_attributes = array('id','href', 'src', 'class', 'style', 'colspan', 'rowspan');
foreach($dom->getElementsByTagName('*') as $node){
    for($i = $node->attributes->length -1; $i >= 0; $i--){
       $attribute = $node->attributes->item($i);
        if(!in_array($attribute->name,$allowed_attributes)) $node->removeAttributeNode($attribute);
    }
}

// Dont forget to add the slashes back in
$html = addslashes($dom->saveHTML());
James
  • 702
  • 2
  • 15
  • 39