3

Here's the deal, I'm making a project to help teach HTML to people. Naturally, I'm afraid of that Scumbag Steve (see figure 1).

So I wanted to block ALL HTML tags, except those approved on a very specific whitelist.

Out of those approved HTML tags, I want to remove harmful attributes as well. Such as onload and onmouseover. Also, according to a whitelist.

I've thought of regex, but I'm pretty sure it's evil and not very helpful for the job.

Could anyone give me a nudge in the right direction?

Thanks in advance.


Fig 1.

Scumbag Steve

Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
  • Actually using regular expressions is the way to go. At least, I would strongly recommend it. They'll give you great flexibility and controller over the strings you're parsing. – Deleteman Mar 27 '12 at 20:32
  • 1
    the way to go http://htmlpurifier.org/ – Luca Filosofi Mar 27 '12 at 20:32
  • @Deleteman: Yes, but I've stated I want a **whitelist**, not a **blacklist**, meaning, everything's blocked, except some specific tags. I don't know how to handle that with RegEx (it would be great if you could throw a small-scale example) – Madara's Ghost Mar 27 '12 at 20:34
  • @Truth I could, but that htmlpurifier.org posted by aSeptik seems to be your solution :) – Deleteman Mar 27 '12 at 20:36
  • @aSeptik: The following: http://tinyurl.com/c8qwqld should have not removed the input attribute, why did it? – Madara's Ghost Mar 27 '12 at 20:39
  • @Truth: i can't test it now, but for sure you should test it on your own, it have lots of powerfull features. Most probably some tags are not allowed on demo mode. as soon as possible i will put a responce \w demo. – Luca Filosofi Mar 27 '12 at 21:11

3 Answers3

5
require_once 'library/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();

 // this one is needed cause otherwise stuff 
 // considered harmful like input's will automatically be deleted
$config->set('HTML.Trusted', true);

// this line say that only input, p, div will be accepted
$config->set('HTML.AllowedElements', 'input,p,div');

// set attributes for each tag
$config->set('HTML.AllowedAttributes', 'input.type,input.name,p.id,div.style');

// more extensive way of manage attribute and elements... see the docs
// http://htmlpurifier.org/live/configdoc/plain.html
$def = $config->getHTMLDefinition(true);

$def->addAttribute('input', 'type', 'Enum#text');
$def->addAttribute('input', 'name', 'Text');

// call...
$purifier = new HTMLPurifier($config);

// display...
$html = $purifier->purify($raw_html);
  • NOTE: as you asked this code will run as a Whitelist, only input, p and div are accepted and only certains attributes are accepted.
Luca Filosofi
  • 30,905
  • 9
  • 70
  • 77
1

Use Zend framework 2 strip tags. An example below to accept ul, li, p... and img (only with src attribute) and links (with only href atttribute). Everything else will be stripped. If I'm not wrong zf1 does the same thing

     $filter = new \Zend\Filter\StripTags(array(
        'allowTags'   => array(
            'ul'=>array(), 
            'li'=>array(), 
            'p'=>array(), 
            'br'=>array(), 
            'img'=>array('src'), 
            'a'=>array('href')
         ),
        'allowAttribs'  => array(),
        'allowComments' => false)
    );

    $value = $filter->filter($value);
E Ciotti
  • 4,740
  • 1
  • 25
  • 17
0

For tags you can use strip_tags

For attributes, refer to How can I remove attributes from an html tag?

Community
  • 1
  • 1
ahmetunal
  • 3,930
  • 1
  • 23
  • 26
  • I don't want all the attributes (as some are needed for learning), I want specific ones to be allowed, this doesn't seem to be addressed in your second link. – Madara's Ghost Mar 27 '12 at 20:38