Efficiently sanitize user entered text

Question

I have a html form that accepts user entered text of size about 1000, and is submitted to a php page where it will be stored in mysql database. I use PDO with prepared statements to prevent sql injection. But to sanitize the text entered by user, what are the best efforts needed to do ?

I want to prevent any script injection, xss attacks, etc.

Similar to this question: http://stackoverflow.com/questions/2745058/php-input-sanitizer. One of their answers points to this website: http://htmlpurifier.org/ — Yuri, Nov 17 '11 at 10:28
thank you. i was actually looking it at. I got it after googling a few seconds ago. — Vpp Man, Nov 17 '11 at 10:34
In the comment, Sorcy says, htmlpurifier do not sanitize certain xss attack scripts. So not full trusted way ? — Vpp Man, Nov 17 '11 at 10:46
Keep in mind that that comment was over a year old. I would assume that it is fixed by now. Nevertheless it would be a good idea to verify these manually :). On the other hand: Making something from scratch is bound to have more security issues than using a specialized product. So I think trusting it would be a rather safe assumption. — Yuri, Nov 17 '11 at 10:49

Polynomial · Accepted Answer · 2011-11-18T14:37:22.587

Security is an interesting concept and attracts a lot of people to it. Unfortunately it's a complex subject and even the professionals get it wrong. I've found security holes in Google (CSRF), Facebook (more CSRF), several major online retailers (mainly SQL injection / XSS), as well as thousands of smaller sites both corporate and personal.

These are my recommendations:

1) Use parameterised queries
Parameterised queries force the values passed to the query to be treated as separate data, so that the input values cannot be parsed as SQL code by the DBMS. A lot of people will recommend that you escape your strings using mysql_real_escape_string(), but contrary to popular belief it is not a catch-all solution to SQL injection. Take this query for example:

SELECT * FROM users WHERE userID = $_GET['userid']

If $_GET['userid'] is set to 1 OR 1=1, there are no special characters and it will not be filtered. This results in all rows being returned. Or, even worse, what if it's set to 1 OR is_admin = 1?

Parameterised queries prevent this kind of injection from occuring.

2) Validate your inputs
Parameterised queries are great, but sometimes unexpected values might cause problems with your code. Make sure that you're validating that they're within range and that they won't allow the current user to alter something they shouldn't be able to.

For example, you might have a password change form that sends a POST request to a script that changes their password. If you place their user ID as a hidden variable in the form, they could change it. Sending id=123 instead of id=321 might mean they change someone else's password. Make sure that EVERYTHING is validated correctly, in terms of type, range and access.

3) Use htmlspecialchars to escape displayed user-input
Let's say your user enters their "about me" as something like this:
</div><script>document.alert('hello!');</script><div>
The problem with this is that your output will contain markup that the user entered. Trying to filter this yourself with blacklists is just a bad idea. Use htmlspecialchars to filter out the strings so that HTML tags are converted to HTML entities.

4) Don't use $_REQUEST
Cross-site request forgery (CSRF) attacks work by getting the user to click a link or visit a URL that represents a script that perfoms an action on a site for which they are logged in. The $_REQUEST variable is a combination of $_GET, $_POST and $_COOKIE, which means that you can't tell the difference between a variable that was sent in a POST request (i.e. through an input tag in your form) or a variable that was set in your URL as part of a GET (e.g. page.php?id=1).

Let's say the user wants to send a private message to someone. They might send a POST request to sendmessage.php, with to, subject and message as parameters. Now let's imagine someone sends a GET request instead:

sendmessage.php?to=someone&subject=SPAM&message=VIAGRA!

If you're using $_POST, you won't see any of those parameters, as they are set in $_GET instead. Your code won't see the $_POST['to'] or any of the other variables, so it won't send the message. However, if you're using $_REQUEST, the $_GET and $_POST get stuck together, so an attacker can set those parameters as part of the URL. When the user visits that URL, they inadvertantly send the message. The really worrysome part is that the user doesn't have to do anything. If the attacker creates a malicious page, it could contain an iframe that points to the URL. Example:

<iframe src="http://yoursite.com/sendmessage.php?to=someone&subject=SPAM&message=VIAGRA!">
</iframe>

This results in the user sending messages to people without ever realising they did anything. For this reason, you should avoid $_REQUEST and use $_POST and $_GET instead.

5) Treat everything you're given as suspicious (or even malicious)
You have no idea what the user is sending you. It could be legitimate. It could be an attack. Never trust anything a user has sent you. Convert to correct types, validate the inputs, use whitelists to filter where necessary (avoid blacklists). This includes anything sent via $_GET, $_POST, $_COOKIE and $_FILES.

If you follow these guidelines, you're at a reasonable standing in terms of security.

thank you. But i did not understand "_Don't use $_REQUEST_". Can you please tell me some more details about how they will be misused using iframe. — Vpp Man, Nov 18 '11 at 11:38
I've updated the post to explain CSRF attacks better. Hopefully this will help you understand why you shouldn't use `$_REQUEST`. — Polynomial, Nov 18 '11 at 14:38
thank you. I understand what you said about $_REQUEST and how it is exploited. But still confused on "iframe" part. Attacker creates an iframe in where ? iframe is used to display another page inside a frame in a page. Like facebook displaying app. So can you tell me what happens when iframe is used by attacker ? — Vpp Man, Nov 18 '11 at 14:45
I will read more articles on iframe attack to understand the idea. — Vpp Man, Nov 18 '11 at 15:13
Also look up the `x-frame-options` header for more info on how to prevent XSS through iframes/clickjacking. — Polynomial, Nov 18 '11 at 19:37
ok. I read [this article](http://blogs.msdn.com/b/ie/archive/2009/01/27/ie8-security-part-vii-clickjacking-defenses.aspx). So our page will be shown in an iframe by the attacker in attacker developed page which contains malicious code. That code may steal cookies or sessions details using iframed page of our site. What happens if i use that x-frame-options for a non-compatible browser like IE6 or 7? Will it stop rendering my page ? — Vpp Man, Nov 19 '11 at 01:36
It prevents displaying of our page inside another frame. But not stops us from displaying iframe content in our page(like FB like button that is placed in our page). Correct ? — Vpp Man, Nov 19 '11 at 01:36
Correct. It prevents `iframe` in a malicious website from loading your site. — Polynomial, Nov 19 '11 at 12:35

score 4 · Answer 2 · edited May 23 '17 at 12:29

4

You need to distinguish between two types of attacks: SQL injection and XSS. SQL injection can be avoided by using prepared statements or the quote functions of your DB library. You use the quoting function this before inserting into the database.

XSS can be avoided by quoting all special chars with htmlspecialchars. It is considered good style to escape the output after you read it from the database and store the original input in the database. This way, when you use the input in other contexts where HTML escaping is not needed (text email, JSON encoded string) you still have the original input form the user.

Also see this answer to a similar question.

edited May 23 '17 at 12:29

Community

1
1

answered Nov 17 '11 at 10:39

chiborg

26,978
14
97
115

thank you. But this input from user is NOT supposed to be html, script content. So no need of storing the exact input from user in the database. Correct ? Use htmlspecialchats() before inserting to database ? – Vpp Man Nov 17 '11 at 10:43
If you are absolutely sure, that the text will never be used outside of a web page context (email, text file, shell script, etc) then you can escape it before putting it into the database. Otherwise, escape after reading it from the database. – chiborg Nov 17 '11 at 10:47
ok. so if i want to perform searching on that field also, i should not save the text after htmlspecialchars() into database. Correct ? – Vpp Man Nov 17 '11 at 10:50
Yes, that's another use case. If you want to be able to successfully search for "Pickett & Sons" then you must either escape the search term with `htmlspecialchars` before searching in the DB (because the text in the DB would be "Picket & Sons" if you escaped it with `htmlspecialchars` before writing it to the DB) or you need to store the content unescaped in the DB and escape when doing your output (which is the preferred method). – chiborg Nov 17 '11 at 11:01
That depends on what you try to achieve: If you want to allow *some* HTML content, then by all means use htmlpurifier, at the moment it's the best library in terms of protection (at the price of speed). But you said that no HTML content should be allowed, so in your case, `htmlspecialchars` is better because it is faster and doesn't require an external library. – chiborg Nov 17 '11 at 13:15
1

so `htmlspecialchars` does block XSS attacks ? EDIT: oh! because all "<", ">" character are translated. So no tags will be echoed. Right ? – Vpp Man Nov 17 '11 at 13:25

score 1 · Answer 3 · edited Nov 17 '11 at 12:47

1

There are two simple things you need to do to be safe:

Use prepared statements or escape the data correctly.
When outputting to HTML, always escape using htmlspecialchars( ).

edited Nov 17 '11 at 12:47

symcbean

47,736
6
59
94

answered Nov 17 '11 at 10:24

RichardJ

61
3

thank you. but what to do BEFORE storing in to database ? that is cleaning of text before saving to database. – Vpp Man Nov 17 '11 at 10:26
That's where you use the prepared statements. Or escape the to-be-inserted data yourself by using mysqli_real_escape_string(); – jan Nov 17 '11 at 10:28
that will only prevent sql injection. Since i use prepared statements(already mentioned it in first post), sql injection attack is covered. But i was talking about other attacks like xss attacks etc. – Vpp Man Nov 17 '11 at 10:40
1

Just assume the user wants the given text to be used verbatim. If this text contains malicious stuff, this gets only relevant on outputting. If you do this outputting in a safe way, nothing can happen. IOW, just store the "plain" data and output them safely. – glglgl Nov 17 '11 at 13:12

Efficiently sanitize user entered text

3 Answers3

Linked

Related