2

Ok, this subject is a hotbed I understand that. I also understand that this situation is dependent on what you are using as code. I have three situations that need to be resolved.

  1. I have a form in where we need to allow people to make comments and statements that use commas, tildes, etc... but still remain safe from attacks.

  2. I have people entering in dates like this: 10/13/11 mm/dd/yy in English, can this be sanitized?

  3. How do I understand how to use htmlspecialchars(), htmlentities() and real_escape_string() correctly? I've read the php.net site and some posts here but this seems to me to be a situation in where it all depends on the person reading the question what the right answer is.

I really can't accept that... there has to be an answer wherein text formats similar to that which I am posting here can be sanitized. I'd like to know if and how it is possible.

Thanks... because it seems to me that when asking this question in other places it tends to annoy... I am learning what I need to know but I think I have hit a plateau in what I can know without an example of what it is meant to do...

Thanks in advance.

Ben Swinburne
  • 25,669
  • 10
  • 69
  • 108
Matt Ridge
  • 3,633
  • 16
  • 45
  • 63
  • 1
    it's amazing how much time you end up spending on string sanitation in web applications. I dare say that the vast majority of PHP code I have written was pure string manipulation. The actual 'logic' parts pale in comparison. – Marijn van Vliet Dec 06 '11 at 18:23

3 Answers3

10

It's a very important question and it actually has a simple answer in the form of encodings. The problem you are facing it that you use a lot of languages at the same time. First you are in HTML, then in PHP and a few seconds later in SQL. All these languages have their own syntax rules.

The thing to remember is: a string should at all times be in its proper encoding.

Lets take an example. You have a HTML form and the user enters the following string into it:

I really <3 dogs & cats ;')

Upon pressing the submit button, this string is being send to your PHP script. Lets assume this is done through GET. It gets appended to the URL, which has its own syntax (the & character has special meaning for instance) so we are changing languages. This means the string must be transformed into the proper URL-encoding. In this case the browser does it, but PHP also has an urlencode function for that.

In the PHP script, the string is stored in $_GET, encoded as a PHP string. As long as you are coding PHP, this is perfectly fine. But now lets put the string to use in a SQL query. We change languages and syntax rules, therefore the string must be encoded as SQL through the mysql_real_escape_string function.

At the other end, we might want to display the string back to the users again. We retrieve the string from the database and it is returned to us as a PHP string. When we want to embed it in HTML for output, we're changing languages again so we must encode our string to HTML through the htmlspecialchars function.

Throughout the way, the string has always been in the proper encoding, which means any character the user can come up with will be dealt with accordingly. Everything should be running smooth and safe.

A thing to avoid (sometimes this is even recommended by the ignorant) is prematurely encoding your string. For instance, you could apply htmlspecialchars to the string before putting it in the database. This way, when you retrieve the string later from the database you can stick it in the HTML no problem. Sound great? Yeah, really great until you start getting support tickets of people wondering why their PDF receipts are full of &amp; &gt; junk.

In code:

form.html:

<form action="post.php" method="get">
    <textarea name="comment">
        I really <3 dogs &amp; cats ;')
    </textarea>
    <input type="submit"/>
</form>

URL it generates:

http://www.example.org/form.php?comment=I%20really%20%3C3%20dogs%20&amp;%20cats%20;')

post.php:

// Connect to database, etc....

// Place the new comment in the database
$comment = $_GET['comment']; // Comment is encoded as PHP string

// Using $comment in a SQL query, need to encode the string to SQL first!
$query = "INSERT INTO posts SET comment='". mysql_real_escape_string($comment) ."'";
mysql_query($query);

// Get list of comments from the database
$query = "SELECT comment FROM posts";

print '<html><body><h2>Posts</h2>';
print '<table>';

while($post = mysql_fetch_assoc($query)) {
    // Going from PHP string to HTML, need to encode!
    print '<tr><td>'. htmlspecialchars($post['comment']) .'</td></tr>';
}

print '</table>';
print '</body></html>'
Marijn van Vliet
  • 5,239
  • 2
  • 33
  • 45
  • 2
    +1 for mentioning why HTML-encoding prior to database insert is a bad idea. – Justin ᚅᚔᚈᚄᚒᚔ Dec 06 '11 at 18:43
  • Can you show a concept of what you are talking about? Also, just curious... is there a way to remove the & > extras? – Matt Ridge Dec 06 '11 at 18:46
  • If you ever end up with `& >` extras, it could be that the string has been encoded twice with `htmlspecialchars`. You can use `htmlspecialchars_decode` to get rid of them. But be very careful when you do this! This possibly opens up the string again to javascript injection attacks. It's almost always better to hunt down where you erroneously apply the second `htmlspecialchars`. – Marijn van Vliet Dec 07 '11 at 09:09
  • @Rodin Thanks, you are the first person to really put this into example of what each code actually does. It is greatly appreciated. It makes understanding how these are used correctly. I am sure many other people will appreciate your effort as well. – Matt Ridge Dec 07 '11 at 12:32
  • @Rodin I do have a question, you use print, it does the same thing as echo correct? – Matt Ridge Dec 07 '11 at 13:18
  • @MattRidge I'm guess I'm just old fashioned that way. I like the word 'print' :) – Marijn van Vliet Dec 08 '11 at 09:41
1

The crucial thing is to understand what each sanitising function available to you is for, and when it should be used. For example, database-escaping functions are designed to make data safe to insert into the database, and should be used as such; but HTML-escaping functions are designed to neutralise malicious HTML code (like JavaScripts) and make it safe to output data for your users to view. Sanitise the right thing at the right time.*

  • There are two different basic approaches you can take: you can sanitise HTML when you receive it, or you can store it exactly as you received it and sanitise it only when it is time to output it to the user. Each of these methods has its proponents, but the second one is probably the least prone to problems (with the first one, what do you do if a flaw is discovered in your sanitising procedure and you find you have insufficiently sanitised content stored in your database?)

Dates can be sanitised using a date parsing function. In PHP you might look at strtotime(). Your objective is typically to take a string representation of a date and output either an object representing a date, or another string that represents the same date in a canonical way (that is: in a specific format).

Hammerite
  • 21,755
  • 6
  • 70
  • 91
  • Ok... I'm looking to insert a statement that would allow characters like the statement I am making now. That's all, and then allowing this statement to be viewed. While being sanitized. – Matt Ridge Dec 06 '11 at 18:13
  • So you have two scripts: one that receives the content (the message) and inserts it into the database, and one that retrieves the content and displays it. The first script is sending data to the database, so it needs to use a database escaping function to make the data safe to use in that way. The second script is sending data to the user's browser, so it needs to use HTML-escaping functions to neutralise the possible ways in which the browser's treatment of that data could harm the user. HTML-escaping, though, is not the only thing to consider; look up cross-site request forgery. – Hammerite Dec 06 '11 at 18:19
  • I do this in one page, not multiple if I get what you are saying... I hate to say this though, I've already looked up what you are suggesting. I really don't need another way of doing something if I don't understand how to do the original question correctly in the first place. – Matt Ridge Dec 06 '11 at 18:41
0

Regarding the sanitization of dates, PHP has some built-in functions that can be helpful. The strtotime() function will convert just about any imaginable date/time format into a Unix timestamp, which can then be passed to the date() function to convert it to whatever formatting you like.

For example:

$date_sql = date( "Y-m-d", strtotime( $_POST["date"] ) );

Kris Craig
  • 558
  • 1
  • 7
  • 19
  • But would this protect from injections? – Matt Ridge Dec 07 '11 at 12:45
  • This really has nothing to do with injections as it's happening at the PHP level. If you're looking to prevent an injection, I would recommend using prepared statements, which are supported in the php_mysqli extension. – Kris Craig Dec 08 '11 at 23:24
  • Oh and shorter answer is: Yes, this protects against injections, since (at least as far as I know) date( "Y-m-d" ), regardless of its input, won't output anything that could be useful in an SQL injection attack. =) – Kris Craig Dec 08 '11 at 23:26