1

Is there anyway to make HTML Purifier strip elements with a certain attribute.

I'm using HTML Purifier to clean up a full webpage into just its basic content so I can index and search it.

I want to be able to add an attribute like data-no-index to some wrapper to make them ignored.

This is my HTML Purifier setup:

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'h1,h2,h3,h4,h5,h6,p,a[href],ul,ol,li,img[src]');
$purifier = new HTMLPurifier($config);
Petah
  • 45,477
  • 28
  • 157
  • 213
  • I have never used Purifier before, but could you just add `tagname[data-no-index]` to your allowed list? – scrowler Dec 19 '13 at 22:26
  • @scrowler I want to deny it, not allow it. – Petah Dec 19 '13 at 22:28
  • ok, how about the `ForbiddenAttributes` option? `$config->set('HTML.ForbiddenAttributes', array('tagname.data-no-index'));` - docs: http://htmlpurifier.org/live/configdoc/plain.html#HTML.ForbiddenAttributes – scrowler Dec 19 '13 at 22:56
  • To be clear, you want the entire element to be removed when the attribute exists? – Edward Z. Yang Dec 20 '13 at 02:18
  • Petah, check this out: http://stackoverflow.com/questions/2638640/html-purifier-removing-an-element-conditionally-based-on-its-attributes (...I guess this is even technically a duplicate, oops. XD) – pinkgothic Dec 20 '13 at 09:17
  • You'll have to do some coding to implement this. There are a few implementation strategies I can think of but nothing out of the box. – Edward Z. Yang Dec 23 '13 at 11:41

0 Answers0