I am working on a form with the possiblity for the user to use illegal/special characters in the string that is to be submitted to the database. I want to escape/negate these characters in the string and have been using htmlspecialchars(). However, is there is a better/faster method?

- 30,738
- 21
- 105
- 131

- 2,085
- 9
- 29
- 44
-
There are 2 camps jumping on either 'illegal characters in a query' and 'illegal characters due to XSS attack / html'. I believe you're talking about the first, but you might want to clarify yourself a bit more. – Wrikken Jun 07 '10 at 21:35
-
HTML is not SQL. Using HTML tools to avoid SQL problems is like using an English spell checker on Arabic text. – Álvaro González Aug 31 '11 at 11:27
6 Answers
There are no "illegal" characters for the database. Database that cannot store some characters is a nonsense. There are some service characters, like quotes, used to delimit strings. These characters should be just escaped, not eliminated.
To send a query to the database, you have 2 options:
Build a query usual way, to make it look exactly as SQL query you can run in sql console.
To do it, one should understand a whole set of rules, not just "use mysql_real_escape_string".
Rules such as:- Strings should be both enclosed in quotes and escaped. That's the only meaning of escaping: It's just escape delimiters! (and some other characters - string termination char and escape character itself). Without surrounding quotes mysql_real_escape_string is just useless.
- Numbers should be cast to it's type explicitly. Though while data numbers can be threaten just like strings, there are some numbers, like LIMIT clause parameters, which cannot be escaped and can be only cast.
To send query and data separately.
This is most preferred way as it can be shortened to just "use binding". All strings, numbers and LIMIT parameters can be bound - no worry at all.
Using this method, your query with placeholders being sent to database as is, and bound data being sent in separate packets, so, it cannot interfere. It is just like code and data separation. You send your program (query itself) separated from the data.
But!
Everything said above covers only data part of the query.
But sometimes we have to make our query even more dynamic, adding operators or identifiers.
In this case every dynamic parameter should be hardcoded in our script and chosen from that set.
For example, to do dynamic ordering:
$orders = array("name","price","qty"); //field names
$key = array_search($_GET['sort'],$orders)); // see if we have such a name
$orderby = $orders[$key]; //if not, first one will be set automatically. smart enuf :)
$query = "SELECT * FROM `table` ORDER BY $orderby"; //value is safe
or dynamic search:
$w = array();
$where = '';
if (!empty($_GET['rooms'])) $w[]="rooms='".mesc($_GET['rooms'])."'";
if (!empty($_GET['space'])) $w[]="space='".mesc($_GET['space'])."'";
if (!empty($_GET['max_price'])) $w[]="price < '".mesc($_GET['max_price'])."'";
if (count($w)) $where="WHERE ".implode(' AND ',$w);
$query="select * from table $where";
In this example we're adding to the query only data entered by user, not field names, which are all hardcoded in the script. For the binding the algorithm would be very similar.
And so on.

- 13,086
- 11
- 53
- 88

- 156,878
- 40
- 214
- 345
If you submit this data to the database, please take a look at the escape functions for your database.
That is, for MySQL there is mysql_real_escape_string.
These escape functions take care of any characters that might be malicious, and you will still get your data in the same way you put it in there.
You can also use prepared statements to take care of the data:
$dbPreparedStatement = $db->prepare('INSERT INTO table (htmlcontent) VALUES (?)');
$dbPreparedStatement->execute(array($yourHtmlData));
Or a little more self explaining:
$dbPreparedStatement = $db->prepare('INSERT INTO table (htmlcontent) VALUES (:htmlcontent)');
$dbPreparedStatement->execute(array(':htmlcontent' => $yourHtmlData));
In case you want to save different types of data, use bindParam
to define each type, that is, an integer can be defined by: $db->bindParam(':userId', $userId, PDO::PARAM_INT);
. Example:
$dbPreparedStatement = $db->prepare('INSERT INTO table (postId, htmlcontent) VALUES (:postid, :htmlcontent)');
$dbPreparedStatement->bindParam(':postid', $userId, PDO::PARAM_INT);
$dbPreparedStatement->bindParam(':htmlcontent', $yourHtmlData, PDO::PARAM_STR);
$dbPreparedStatement->execute();
Where $db
is your PHP data object (PDO). If you're not using one, you might learn more about it at PHP Data Objects.

- 5,426
- 9
- 42
- 61
-
-
6An even better solution to the SQL injection problem is to use parametrized queries. That gets rid of the need to escape by hand completely. – Matti Virkkunen Jun 07 '10 at 21:08
-
2Yes to parameterized queries. Why steer people towards old technologies that are more prone to injection? – webbiedave Jun 07 '10 at 21:11
-
1It's not clear what the question is asking. SQL injection or XSS? `mysql_real_escape_string` doens't prevent XSS, and `htmlspecialchars` doesn't prevent sql injection. If I had any votes left I would -1 this for `mysql_escape_string`: "This function has been DEPRECATED as of PHP 5.3.0. Relying on this feature is highly discouraged." And +1 to the guy mentioning parametrized queries. – Lotus Notes Jun 07 '10 at 21:12
-
i have removed mysql_escape_string fromt he answer because of its deprecation, parametrized queries are better but normally people are working with mysql_query where a simple escape string is enough – favo Jun 07 '10 at 21:33
-
@favo: not *enough* as much as *all they have.* @Byron: Funny that the docs say *deprecated as of 5.3.0*. `mysql_escape_string` has been deprecated for many years. Check out this archived page from 2004: http://web.archive.org/web/20041207044948/http://us2.php.net/mysql_escape_string – webbiedave Jun 07 '10 at 21:43
-
You should probably tell pdo what type of data you are passing it via http://www.php.net/manual/en/pdostatement.bindparam.php ... no? I assume that your way just casts it to a string instead though right? – SeanJA Jun 07 '10 at 22:00
-
-
At the moment I am just looking to remove characters that will cause problems when I build the queries with or use the string for any other reason. I might not have been described my problem/question correctly. I didn't know there were all of these various options/methods. – Brook Julias Jun 08 '10 at 02:17
First of all, you should sanitize things when displaying, not before inserting into the database. SQL injections are another story, but probably off-topic.
Second, if you don't need your users to be able to post HTML at all, htmlspecialchars
is all you need. It takes care of all the special characters in HTML.

- 63,558
- 9
- 127
- 159
-
Wow, that's exactly what I just wanted to write when the "new answer" popped up ;) – Marian Jun 07 '10 at 20:55
-
So use Javascript to take sanitize the text as it is being entered? – Brook Julias Jun 07 '10 at 20:57
-
6@Brook: What? How the hell did you come up with that? That's totally random! – Matti Virkkunen Jun 07 '10 at 20:58
-
2@Brook never trust anything that comes from a client. If you are using client side JavaScript, they can easily get around it. Feel free to validate when they hit submit on the client, but don't trust that it has been validated. You need to revalidate on the server. – TheJacobTaylor Jun 07 '10 at 22:14
I am working on a form with the possiblity for the user to use illegal/special characters in the string that is to be submitted to the database.
Users can go a lot beyond than that actually.
I want to escape/negate these characters in the string and have been using htmlspecialchars(). However, I would like to know if there is a better/faster method.
Use HTML Purifier:
HTML Purifier is a standards-compliant HTML filter library written in PHP. HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist.

- 377,238
- 77
- 533
- 578
-
1Thanks for the link to HTML Purifier. It looks like it will be extremely helpful. – Brook Julias Jun 07 '10 at 20:56
This is not a problem you want to tackle on your own. There are libraries out there to do this for you, such as the HTML Purifier.

- 44,124
- 5
- 66
- 109
-
It was definitely not something I really wanted to tackle on my own. Thanks for the link HTML Purifier looks like it will be especially helpful. – Brook Julias Jun 07 '10 at 20:55
You haven't stated what these illegal characters may be but you should definitely be using the database API's supplied mechanism to escape data. For instance, if you're using MySQL, use PDO parameterized SQL statements.

- 48,414
- 8
- 88
- 101