What constitutes 'user input' when dealing with XSS attack prevention?

Question

I am looking to secure my code against XSS attacks, yet all of the examples I have been reading deal with direct user input validation (such as in a contact form or a login).

I'm a bit confused as to if I need to protect my code if there was no way to input directly (ie, my website was only reading from a database and not writing to it)? I'm still thinking I need to because I class my database as an external source, and data within the variables echoed are still coming from elsewhere.

Am I right in thinking that any data read still constitutes user input and should be treated accordingly? Also, if I then added a contact form, would I need to then validate/sanitise/escape every piece of information pulled from my database in every page, or only deal with it at the form itself?

Actually XSS is not about user input, but about output, of anything retrieved from anywhere to the html page. — zerkms, Oct 07 '12 at 20:26
See: http://stackoverflow.com/questions/5414962/protection-against-xss-exploits — Adam, Oct 07 '12 at 20:30
@WesleyMurch thanks for that - I had been reading it but still getting confused. Might be a good idea to just keep reading it over. — tomdot, Oct 07 '12 at 20:36
@Adam that doesnt help too much - I know XSS is bad and I am learning about how to prevent it, the question I am asking is what constitutes user input - obviously there is direct user input, but does the data I have also entered into my database constitute user input also? — tomdot, Oct 07 '12 at 20:39
That's a better way to think about it - essentially if I'm calling a query then that data is unknown really isn't it? Unknown to the code anyway...well, that's how it seems to me. — tomdot, Oct 07 '12 at 20:42
@WesleyMurch If you put all of the above into an answer I will accept it as correct — tomdot, Oct 07 '12 at 20:47
Including but not limited to user form submissions, cookies, http headers, variables from request($_GET, $_POST, etc). — AKS, Oct 07 '12 at 22:27

score 4 · Accepted Answer · answered Oct 07 '12 at 20:53

Forget the term "user input" and think in terms of "unknown strings". Anything that you do not know for a fact what it contains is potentially dangerous or disruptive in the right context.

It's also important to remember there is no single solution for all cases. For example these all may require different types of sanitizing or escaping:

HTML attributes: <a href="$unknown">
HTML text content: <p>$unknown</p>
javascript: <script>var B = $unknown;</script>
SQL: SELECT * from $unknown
CSS: .myClass { color:$unknown; }

In general you should (if possible) avoid using unknown data in HTML attributes, CSS, or Javascript - because those are places where it can get complicated. For most cases, simply escaping the HTML characters is all you need to do.

The key word here is context, which is one reason why you never want to "sanitize" input, but output. The same data could be used in different contexts and require different measures of escaping or filtering.

I highly suggest using OWASP as a resource to learn about XSS and security in general: https://www.owasp.org/index.php/Cross-site_Scripting_(XSS)

So direct answer to "What constitutes 'user input' when dealing with XSS attack prevention?" - *Everything* you put in your HTML or JavaScript, no exceptions. (and my SQL example is not XSS related at all, but still relevant in terms of understanding context). — Wesley Murch, Oct 07 '12 at 21:01

score 1 · Answer 2 · answered Oct 07 '12 at 20:27

Am I right in thinking that any data read still constitutes user input and should be treated accordingly?

In general - yes. Most databases contain mostly plain text and numbers.

There are exceptions though. For instance, if you are explicitly storing HTML in there, and making sure that it is safe (or at least trusted) when you enter it, then you don't need to worry about protecting yourself from XSS when you pull the data out. An example of this would be a CMS (such as Wordpress) which allows users to enter HTML into articles.

Also, if I then added a contact form, would I need to then validate/sanitise/escape every piece of information pulled from my database in every page, or only deal with it at the form itself?

The form allows data from outside the system to be entered. You need to take whatever measures are suitable for when you put that data anywhere. If you put it into string of SQL, then you need to escape it for SQL. If you put it into an email subject, then you need to escape it for that. If you put it into an HTML document then you need to escape it for that. (And so on).

Yes, I was thinking about specific HTML also because I will have fields calling links to external sites. This is helpful thanks. — tomdot, Oct 07 '12 at 20:44

What constitutes 'user input' when dealing with XSS attack prevention?

2 Answers2