You can do that using a regex, but it's hard to find a good one that always works. See this thread.
What you can use, is HTMLPurifier. I use it to cleanup all posted information. You can select which tag to keep, wich attribute to keep for each tag, etc ..
HTML filter that guards against XSS and ensures standards-compliant output.
One thing you can do with HTMLPurifier, is extend the core code and then, for a given tag, extends the class to add a class for each entity.
You can check this (quick) example where user wants to convert this:
<p>This is a paragraph</p>
<p>Another one</p>
Into this:
<p class="myclass">This is a paragraph</p>
<p class="myclass">Another one</p>
Edit:
Here is a quick and dirty example that you can test on your own:
<?php
require_once 'lib/library/HTMLPurifier.auto.php';
class HTMLPurifier_AttrTransform_AnchorClass extends HTMLPurifier_AttrTransform
{
public function transform($attr, $config, $context)
{
// keep predefined class
if (isset($attr['class']))
{
$attr['class'] .= ' myclass';
}
else
{
$attr['class'] = 'myclass';
}
return $attr;
}
}
$dirty_html = '<p><a href=""></a>
<a target="_blank" href=""></a>
<a href="" class="toto"></a>
<a href="" style="oops"></a></p>';
$options = array(
'HTML' => array(
'Allowed' => 'a[href|target|class]')
);
$config = HTMLPurifier_Config::create($options);
$htmlDef = $config->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_AnchorClass();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
var_dump($clean_html);
It outputs:
string '<a href="" class="myclass"></a>
<a href="" class="myclass"></a>
<a href="" class="toto myclass"></a>
<a href="" class="myclass"></a>' (length=135)
I use a custom configuration to keep some attributes in <a>
tag, that's why it removes style
but not target
. You can check the documentation about that.