1

I have an entry form where the user can type arbitrary HTML. What do I need to filter out besides script tags? Here's what I do:

userInput.replace(/<(script)/gi, "&lt;$1");

but the sanitizer of WMD (used here on SO) manages a white list of tags, and filters out (blanks) all other tags. Why?

I don't like white lists because I don't want to prevent the user from entering arbitrary tags if she so chooses; but I can use a more extensive black list, besides 'script', if needed. What do I need as a black list?

Bambax
  • 2,920
  • 6
  • 34
  • 43
  • 1
    black listing is is really harder, just thinking about the script tage check this link : http://ha.ckers.org/xss.html – regilero Sep 02 '11 at 11:58
  • see also recent threads http://stackoverflow.com/questions/7268023/why-should-i-use-bbcode-but-not-html-in-comment-forms/7268195 http://stackoverflow.com/questions/199017/strict-html-validation-and-filtering-in-php/199123 http://stackoverflow.com/questions/7255158/best-way-to-secure-simple-wysiwyg-with-php/7255253 – Cheekysoft Sep 02 '11 at 12:50

1 Answers1

3

Short answer: anything they can do with the script tag.

The script tag is not required to run javascript. Script can also be placed in almost every HTML tag. Script can appear in a number of places additional to the script tag including, but not limited to, src and href attributes that are used for URLs, event handlers and the style attribute.

The ability for a user to put unwanted script into your page is a security vulnerability known as cross-site scripting. Read around this topic and read the XSS prevention cheat sheet.

You may not want to let users add HTML to your pages. If you need this feature, consider other formats such as Markdown that allows you to disable the use of any embedded HTML; or another less secure option is to use a filtering library that tries to remove all script, such as HTMLPurifier. If you choose the filtering option, be sure to subscribe to announcements of new releases and always go back to your project to install the bug-fixed releases of the filter as new exploits are found and worked-around.

Cheekysoft
  • 35,194
  • 20
  • 73
  • 86
  • Markdown allows HTML, that's the point (and WMD that I mention in my question is the markdown editor of SO); so using Markdown == using HTML (when previewing mk input). – Bambax Sep 02 '11 at 12:53
  • The main thing to take away is that implementing a blacklist is not a good idea from the point of view of security, as missing one attack vector from the list means that you can still be exploited. HTMLPurifier is a well maintained "blacklist" project that aims to solve this problem in the way you suggest. Look at the size of the code in that project and know that this is *hard*. Note that HTMLPurifier was found to vulnerable to XSS in 2010 and 2011 and constant research and maintenance goes into this. The XSS cheat sheet from regilero's comment shows how diverse attack vectors can be. – Cheekysoft Sep 02 '11 at 13:05
  • I had never really studied XSS seriously (as is probably obvious from my question ;-), but just reading the XSS prevention cheat sheet that you linked to is fascinating. I see how blacklisting is hard/impossible (since the imagination of the attackers is infinite) and will implement white listing instead. – Bambax Sep 02 '11 at 13:37