If you mean "clean" as in "secure" the htmlspecialchars()
is quite alright. You may want to use htmlentities()
, which encodes all characters as opposed to just the special ones.
Some characters get by htmlentities()
and htmlspecialchars()
(those which aren't in Latin1) and consequently, you might want to "UTF-8 proof" your output. You can use this function I found on as a comment on the PHP docs.
// Unicode-proof htmlentities.
// Returns 'normal' chars as chars and weirdos as numeric html entites.
function superentities( $str ){
// get rid of existing entities else double-escape
$str = html_entity_decode(stripslashes($str),ENT_QUOTES,'UTF-8');
$ar = preg_split('/(?<!^)(?!$)/u', $str ); // return array of every multi-byte character
foreach ($ar as $c){
$o = ord($c);
if ( (strlen($c) > 1) || /* multi-byte [unicode] */
($o <32 || $o > 126) || /* <- control / latin weirdos -> */
($o >33 && $o < 40) ||/* quotes + ambersand */
($o >59 && $o < 63) /* html */
) {
// convert to numeric entity
$c = mb_encode_numericentity($c,array (0x0, 0xffff, 0, 0xffff), 'UTF-8');
}
$str2 .= $c;
}
return $str2;
}
As for escaping your data when it enters the database, you can apply htmlentities before you insert into the database. Then, when you output, you can do it again for good measure, but be sure to not double encode or else you won't be able to read anything. Here's an example.
//Decode existing htmlentities
$OutputStringRaw = html_entity_decode(stripslashes($str),ENT_QUOTES,'UTF-8');
//Now you can apply htmlentities (or wtv else) w/o fear of double encoding.
$OutputStringClean = htmlentities($OutputStringRaw);
But really, it's best just to leave the entries in the database without the html escaping. When you insert your data, either use PDO (here's an ok tutorial on it), or use keep on using the mysql_real_escape_string you've been using.