1

I need to prevent XSS attacks as much as possible and in a centralized way so that I don't have to explicitly sanitize each input.

My question is it better to sanitize all inputs at URL/Request processing level, encode/sanitize inputs before serving, or at the presentation level (output sanitization)? Which one is better and why?

Badr Ghatasheh
  • 968
  • 2
  • 7
  • 19
  • 4
    sanitized only when necessary: DON'T do this work before. ( e.g DON'T save a value passed in htmlspecialchars in a database ) Do it only when you need to print to output. [Twig](http://twig.sensiolabs.org/) has escaped automatic and you do not have to worry about this aspect of your application. – Federkun Feb 19 '13 at 08:54
  • Thanks, but why not? what's the problem of having data already cleaned up before inserting into the database? I'm trying to centralize it to reduce the effort. – Badr Ghatasheh Feb 19 '13 at 09:22
  • possible duplicate of [What are the best practices for avoiding xss attacks in a PHP site](http://stackoverflow.com/questions/71328/what-are-the-best-practices-for-avoiding-xss-attacks-in-a-php-site) – Quentin Feb 19 '13 at 09:26
  • 2
    @BadrGhatasheh — You never know what you are going to want to do with the data in the future. Need to put it into a PDF? Or an email? Now you have to convert it from HTML back to text. Also, doing it anywhere before Just In Time makes it hard to spot where it should be done leading to more change of forgetting or double escaping. – Quentin Feb 19 '13 at 09:27

1 Answers1

3

There are two areas where you need to be aware:

  1. Anywhere where you use input as part of a script in any language, most notably including SQL. In the particular case of SQL, the only recommended way of dealing with things is the use of parameterized queries (which will result in unescaped content being in the database, but just as strings: that's ideal). Anything involving the magic quoting of characters before substituting them directly into the SQL string is inferior (because it's so easy to get wrong). Anything that can't be done with a parameterized query is something that a service secured against SQL-injection should never allow a user to specify.

  2. Anywhere where you present something that was input as output. The source of the input could be direct (including via a cookie) or indirect (via the database or a file). In this case, your default approach should be to make the text that the user sees be the text that was input. That's very easy to implement correctly since the only characters you actually have to quote are < and &, and you can wrap it all in <pre> for display.

But that's often not enough. For example, you might want to allow users to do some sort of formatting. This is where it is ever so easy to go wrong. The simplest approach in this case is to parse the input and detect all the formatting instructions; everything else needs to be quoted properly. You should store the formatted version additionally in the database as an extra column so that you don't need to do much work when returning it to the user, but you should also store the original version that the user input so you can search over it. Do not mix them up! Really! Audit your application to make totally sure that you get this right (or, better yet, get someone else to do the audit).

But everything about being careful with SQL still applies, and there are many HTML tags (e.g., <script>, <object>) and attributes (e.g., onclick) that are never ever safe.


You were looking for advice about specific packages to do the work? You really need to pick a language then. The above advice is all totally language-independent. Add-on packages/libraries can make many of the steps above really easy in practice, but you still absolutely need to be careful.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
  • Thank you for the great answer, its just what I needed, and actually I never intended for it to be a language specific question, for what its worth, I'm using TCL as implementation language. – Badr Ghatasheh Feb 19 '13 at 09:49