22

I'm researching PHP security best practices and specifically the HTML Purifier library.

I like the idea of using a third-party library to help strengthen the security of my sites, but I'm confused about a few things...

  1. First, a general question... What does HTML Purifier do that practicing secure PHP programming can't?

  2. If I'm using HTML Purifier, does that mean I get to skip common security measures like using PHP functions to filter input and escape output?

  3. One of the response comments for this question seems to suggest that HTML Purifier is only needed for elements that allow HTML tags, such as WYSIWYG editors. Is this correct?

  4. Has anyone noticed a performance lag from using HTML Purifier? This article makes it seem like performance impact is worth considering.

  5. Are there any up-to-date tutorials on integrating HTML Purifier with a non-framework PHP application? Everything I've found is either old or framework-specific.

Just to confirm that I've done my homework before asking this...

  • This question is essentially the same as mine, but the lone response seems to just list another best practice that the asker forgot to mention

  • This 'bountiful' question is a terrific resource about HTML Purifier and HTML5, but assumes foundational knowledge

  • This comparison page on HTML Purifier's site is more of a comparison to other filters

Community
  • 1
  • 1
cantera
  • 24,479
  • 25
  • 95
  • 138
  • HTML Purifier's main purpose is to prevent XSS attacks, especially if your site get some content from user. What's your definition of secure php programming? Does it includes xss prevention? – bertzzie Jan 25 '12 at 07:17
  • XSS prevention is definitely a huge part of how I define secure PHP programming. I filter/escape and target XSS prevention specifically through liberal use of htmlentities(). Does HTML Purifier's focus on XSS mean that it replaces htmlentities() in some fashion, or would I still continue to use that function? – cantera Jan 25 '12 at 07:54

1 Answers1

27

There are two extremes when accepting any input from your users:

  1. Indiscriminately escape everything to HTML entities, so the user can inject nothing. This is 100% secure, but allows the user no freedom to add any HTML, for example for bolding text and the like.
  2. Output the content as you received it from the user. This allows the user to <b>bold text</b>, but also to inject scripts or mess with your HTML in any other form the user desires, intentionally or unintentionally.

HTML Purifier allows a middle ground: allow the user to inject some HTML, but not malicious HTML. That's a messy thing to attempt of course, but HTML Purifier is purportedly one of the few libraries, if not the only, that gets it right.

That's the only thing it's supposed to be used for. Don't drop your other security practices. In fact, I'd avoid the whole issue entirely by allowing users to only use a controlled markup language to style their input, such as Markdown (which Stackoverflow uses).

deceze
  • 510,633
  • 85
  • 743
  • 889
  • Great response. To clarify, are you saying that "HTML whitelisting" is the one and only reason to use HTML Purifier? If so, it's likely not for me - I mainly deal with form inputs where HTML code would never be accepted. If HP's focus really is that narrow, it seems like that should be made clearer on HP's website, or perhaps in a Wikipedia entry. – cantera Jan 25 '12 at 08:05
  • 2
    Yes, that's what it does. That's also pretty explicitly expressed on its website IMO: `HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, ...` :-P – deceze Jan 25 '12 at 08:17
  • I honestly missed it, even after watching the Zendcasts tutorial and reading most of the website. Although even after re-reading the snippet you posted, I still interpret it as an "all-purpose purifier" -- not just for wysiwyg and the like. Thanks for clarifying. – cantera Jan 25 '12 at 08:49