SQL Injection
The universal answer to SQL Injection problems is “never send any user input to the database as part of an SQL string”. Anything that can go as a parameter should do so. Thus, instead of (in some dialect that might not exactly match what you're looking at):
db eval "SELECT userid FROM users WHERE username = '$user' AND password = '$pass'"
you do:
db eval "SELECT userid FROM users WHERE username = ? AND password = ?" $user $pass
# I personally prefer to put SQL inside {braces}… but that's your call
The key is that because the database engine just understands that these are parameters, it never tries to interpret them as SQL. Injection Impossible. (Unless you're using badly-written stored procedures.)
It gets much more complex where you want to have a table or column name specified by a user. That's a case where you can't send it as a parameter; such SQL identifiers must be interpreted by the SQL engine. Your only alternatives there are to either remap from user-supplied terms to ones that you control, or to rigorously validate.
Remapping is done by having a separate trivial table that maps from user-supplied names to ones you've generated:
db eval {SELECT realname FROM namemap WHERE externalname = ?} $externalname
Because the generated name is easy to guarantee to be free of nasty characters and not to be one of SQLs keywords, it can be safely used in SQL text without further quoting. You can also try doing the mapping per request (factor out the mapping code to a procedure of course) by stripping all bad characters from it. A suitable regsub
might be:
regsub -all {\W+} $externalname "" realname
but then we need additional checks to see that it isn't “evil”:
# You'll need to create an array, SQLidentifiers, first, perhaps like:
# array set SQLidentifers {UPDATE - SELECT - REPLACE - DELETE - ALTER - INSERT -}
# But you can do that once, as a global "constant"
if {[regexp {^\d} $realname] || [info exist SQLidentifiers([string toupper $realname])]} {
error "Bad identifier, $externalname"
}
As you can see, it's a good idea to factor out such transforms and checks into their own procedure so you get them right, once.
And you must test your code extensively. I cannot stress that hard enough. Your tests must try really hard to break things, to make SQL injections via every possible field that anyone could pass into the software; not one of them should ever result in anything happening that your code ever expects.
It's probably a good idea to get someone else to write at least some of the tests; experience from the security community suggests that it is relatively easy to write code that you can't break yourself, but much harder to write code that someone else can't break. Also consider doing fuzz testing, sending computer-generated random data at the interface. In all cases, either things should give a graceful error or should succeed, but never ever cause the application to outright fail.
(You might well allow highly-authenticated users — system/database administrators — to outright specify SQL to evaluate so they can do things like setting the system up, but they're the minority case.)
Cross-site Scripting
This is actually conceptually quite similar: it's caused (principally) by someone putting something in your site that unexpectedly gets interpreted as HTML (or CSS, or Javascript) rather than as human-readable text (with SQL injection, it's something getting interpreted as SQL rather than as data). Because you can't do the equivalent of parameterised queries when going back to the client, you have to use careful quoting. You're strongly recommended to do the careful quoting by using a proper templating library that constructs a DOM tree (with data coming from users or from the database being only ever inserted as text nodes).
If you want users to supply a marked up piece of text, consider either delivering it back as plain text before using Javascript to render it as, say, Markdown, or completely parsing the user-supplied text on the server to construct a model (e.g., DOM tree) of what should be delivered, before sending it back as HTML generated from that model.
You must not allow users to specify a location where you load a script or frame from. Even allowing them to specify links is worrying, but you probably have to permit that if you can't restrict things to straight plain text. (Consider adding a mechanism for listing all links that have been supplied by users. Consider marking all external links with rel=nofollow
unless you can positively detect that they go to somewhere that you whitelist.)
Direct supply of HTML is a “highly-authenticated users only” operation.
(I told a lie above. You can do the equivalent of SQL parameterised queries. You write JS that the client executes to fetch the user data using an AJAX query, perhaps serialized as JSON, and then do DOM manipulations there to render it; in effect, you're moving the DOM construction from the server to the client, but you're still doing DOM construction as that's the core of how you get this right. You have to remember to never insert the things retrieved as straight HTML though. Clients must not trust the server too much.)
The comments I made above above about testing apply here too. With testing for XSS, you're looking to inject something like <script>alert("boom!")</script>
; any time you can get that in and cause a popup dialog — except by being a system administrator with direct permission to edit HTML directly — you've got a massive dangerous hole to plug. (It's quite a good thing to try to inject, as it is very noticeable and yet fairly benign in itself.)
Don't try to just filter out <script>
using regular expressions. It's far too hard to get that right.