– pinkgothic Sep 21 '16 at 10:55

  • Regarding the attributes, `async` is likely being removed because HTML Purifier doesn't know that attribute. A lot of HTML Purifier's protection stems from that it does HTML-aware whitelisting of tags _and_ attributes. That means it has to know and understand all HTML you want to use. `async` is an HTML5 attribute, and HTML Purifier does not (I think) support HTML5 yet. You'd have to change it. See http://htmlpurifier.org/docs/enduser-customize.html – pinkgothic Sep 21 '16 at 11:02
  • I just want to leave `script` and `iframe` embeds as someone pastes them from websites like Youtube, Twitter, Instagram, Pinterest, Facebook, some local media websites and so on. Iv'e got sorted everything with `iframe`s, but `script` tags either gets whole removed or just attributes. How can I get `script` tags left as is, but fix everything else (like closing tags and other tags' attributes)? Updated the question to make it more clear what I'm trying to achieve – Karmalakas Sep 21 '16 at 11:04
  • Yes, I've seen requests for HTML5 coming since 2011, but still no response about it. That's why I tried to add these attributes to `script` tag to definition, but I have no idea if I'm doing it right. As I said, couldn't find any proper example for it. – Karmalakas Sep 21 '16 at 11:07
  • I'm not sure I understand your use-case - do you trust your users not to supply malicious HTML? In that case, are you trying to use HTML Purifier to tidy your HTML rather than sanitise it? HTML Purifier is intended as a security module. If you just want to tidy your HTML, maybe try https://github.com/htacg/tidy-html5 ? – pinkgothic Sep 21 '16 at 11:09
  • Re: `async`, have you tried `$def = $config->getHTMLDefinition(true); $def->addAttribute('script', 'async', 'Enum#async');` (in place of the example at "Add an attribute" of the customize documentation with `target` for `a`-tags)? – pinkgothic Sep 21 '16 at 11:11
  • Maybe Tidy HTML5 would be a solution, but couldn't figure out how to use it in PHP if possible. And `$def->addAttribute('script', 'async', 'Enum#async');` also should clear the `async` attribute, because it's just empty attribute and not `async="async"`. But I'll try it later. – Karmalakas Sep 21 '16 at 11:25
  • Actually, ` – pinkgothic Sep 21 '16 at 11:32
  • Re: Tidy, this might help: http://php.net/manual/en/book.tidy.php – pinkgothic Sep 21 '16 at 11:34
  • Will look into it later and let You know. Thanks – Karmalakas Sep 21 '16 at 11:43
  • HTML Purifier supports the `Bool` attribute type directly, by the way, I managed to miss that in my reading earlier. Armed with that knowledge I've tossed you an answer. Good luck! – pinkgothic Sep 21 '16 at 11:49
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/123859/discussion-between-pinkgothic-and-cronus). – pinkgothic Sep 21 '16 at 12:31
  • 1 Answers1

    0

    Judging from our comment conversation, you appear to be more interested in tidying your HTML rather than sanitising it. As such, HTML Purifier is probably not the right tool for you. You could look into PHP's Tidy module (a wrapper for HTML Tidy), or check out alternatives.

    If you do want to teach HTML Purifier to accept script tags, HTML.Trusted is the right setting. That said, HTML Purifier does not currently support HTML5 and therefore does not understand HTML5-only attributes like async. To teach HTML Purifier that attribute, you need to follow the instructions on the Enduser "Customize" documentation.

    In your case the first step to take is to add this to your code:

    $config = HTMLPurifier_Config::createDefault();
    $config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
    $config->set('HTML.DefinitionRev', 1);
    $config->set('Cache.DefinitionImpl', null); // remove this later!
    $def = $config->getHTMLDefinition(true);
    $def->addAttribute('script', 'async', 'Bool#async');
    

    This teaches HTML Purifier that async is an expected boolean value for the tag. If you want to allow the charset attribute, try:

    $def->addAttribute('script', 'charset', 'Enum#utf-8');
    

    ...or whatever other charsets you want to support. If you want to support any value:

    $def->addAttribute('script', 'charset', 'CDATA');
    
    Community
    • 1
    • 1
    pinkgothic
    • 6,081
    • 3
    • 47
    • 72