Sanitizing user data
What is necessary when it comes to sanitizing data - specifically, in this case, user input data - depends on a multitude of criteria including (but not limited to) the following:
- The type of data expected (e.g.
string
, integer
, binary
)
- Where the data is to be stored (e.g.
database
, text file
)
- How the data is to be used (e.g.
displayed back to users
, server side calculations
)
- Well, you get the gist...
Expected data type
The expected data type is highly important when it comes to sanitizing user input as such it can make it very easy or more complicated to sanitize.
Typical expected input types would be:
- Text
- Integer
- Decimal
- Binary
In the event that the input is simply an integer
or decimal
number you can easily sanitize by simply making sure that the input is converted to the expected data type.
In the event of a text
expected input you are likely to want to escape or strip certain characters/strings to prevent against things like XSS
and SQL injection
.
In the event of binary
data being input you are going to want to make sure that the data is safe and may wish to run a scan for malicious code and also set correct user permissions for the file to prevent code running on your server. Like with text
you may also need to escape certain characters.
Storage location
Storage location also impacts what sanitization is necessary.Taking the example of database
vs file
:
When entering data into a database you need to escape certain characters to protect your system from people attempting SQL injections
conversely, when not using a database you aren't using SQL
so protecting specifically against SQL injection
is not required.
However, that doesn't mean you can be relaxed when not saving to a database. Saving to files can also lead to malicious code being uploaded and steps should be taken to prevent files that have been uploaded or created to store information from being executable
on the server. (This isn't limited to code like where you are specifically allowing upload of a file:
<input type="file" name="uploadFile">
it also applied to for example:
file_put_contents($uploaded_data);
Data use
The purpose of allowing the user to input data is also going to play a role in deciding what protection you need to apply to the user input. While input can be used for many different reasons the main (or most common) reasons are likely to be:
- To be displayed back to users (e.g.
forum posts
, comments
, images
)
- To be used in calculations on site (e.g.
calorie counters
, insurance quotes
)
If the data is to be displayed back to the user you need to think about protection from attacks like XSS
as well as stopping from visually defacing your site; both of which can be carried out by injecting tags such as: <script>...</script>
On the other hand if the data is not going to be shown back to users in it's current form then XSS
etc. is likely to be somewhat irrelevant.
When to sanitize/format
Sanitization of data should be taken care of before you do anything with it server side that could be affected by dodgy code (e.g. before inserting into a database).
You might chose to sanitize just before you use the data or you may prefer to sanitize code as soon as you start your script. This though is largely down to personal preference and the use case of the system in question.
However, there is often debate on when to run functions like htmlspecialchars
(i.e. before uploading to database vs before displaying to the user). This again, is down to personal preference and how you are using the data however there are pros and cons to both methods - which I won't elaborate too much on.
However, if you store the user data to the database as raw
(but safe) you are then free to change how you sanitize data across time. Where as if you start storing data after using functions like htmlspecialchars
and stip_tags
etc. you are removing/altering some data from the input that you might decide at a later date you want to allow/include only to realise that you already lost the data. For example:
strip_tags
will by default strip all tags from the input which might
seem like a good idea at the moment but further down the line, you
might decide that actually want to allow some tags like <b>
or <i>
- but they are no longer present in any inputs that have already been saved.
Of course, if you are using your own mark up
or bb
(or similar) for formatting then strip_tags
etc before saving to a database would be perfectly reasonable. Similarly if you only want to store plain text then stripping everything that isn't plain would also be reasonable. Again, it depends on your use case...
Your code
See comments for a simple explanation of what each function does.
$mots = mysql_real_escape_string( //Escapes certain characters to 'sanitize' for input to database
stripslashes( //Removes any escape slashes added by default in $_POST
strip_tags( //Removes any tags present in the text e.g. <b></b>
htmlspecialchars( //Coverts some charachters like £ to html codes like £
$_POST['mots']
)
)
)
);
Problems
The most glaring problem with your code is the order in which you are applying the functions. stip_tags
should always come before htmlspecialchars
(or any other function that will encode characters). The reason for this is simple:
$_POST['userinput'] = "<b>Some user input</b>"; //Input uploaded from form
echo strip_tags($_POST['userinput']);
//Ouputs: Some user input
echo htmlspecialchars($_POST['userinput'], ENT_QUOTES, 'UTF-8');
//Output: <b>Some user input</b>
echo strip_tags(htmlspecialchars());
//Output: <b>Some user input</b>
In the above code example you can see that if you run htmlspecialchars
before strip_tags
the <
and >
symbols are converted to their respective html codes
and are not stripped... So essentially, strip_tags
is useless. However, if you run it first then you will strip the tags as required and encode any other miscellaneous special chars
.