3

I am using this currently, but it doesn't seem to be working for bullets:

function sanitizeMySQL($var){
        $var = mysql_real_escape_string($var);
        $var = sanitizeString($var);
        return $var;
}

function sanitizeString($var)
{
    $var = str_replace('•','•', $var);
    $var = htmlentities($var);
    $var = strip_tags($var);
    return $var;
}

This is what bullets show up like in my db after someone has submitted them through a textarea:

•

EDIT: This is now what I am getting: •.

I do have bullets stored in my db, so I know it allows them. Is there a correct way to store bullets in latin-1 encoding?

sixshift04
  • 329
  • 4
  • 8
  • 19
  • 1
    Are your tables using latin-1 encoding? Because I'm fairly sure the round bullet is a UTF-8 thing. – Palladium Jul 26 '12 at 17:15
  • This is an encoding problem. What encoding are your pages in, what encoding are you using in the data base? Side note: `strip_tags()` and `htmlentities()` are not necessary for database sanitation, see [The ultimate clean/secure function](http://stackoverflow.com/q/4223980)/ – Pekka Jul 26 '12 at 17:16
  • You should really sanitize the string **before** you escape it. I'm pretty sure JS can be injected trough this and maybe even SQL. – Vatev Jul 26 '12 at 17:19
  • What encoding are you using in the *page*? – Pekka Jul 26 '12 at 17:54

1 Answers1

2

The data that is submitted through your form and your source code do not have the same encoding. Therefore the • characters from your source code do not match the ones in the actual data. Therefore they are not being replaced. Unify on a common encoding. See Handling Unicode Front To Back In A Web App.

Also, your sanitization strategy is pretty weird. I don't know what you have against "•", this should not be replaced in a general "sanitization" function. Furthermore, you're first HTML escaping everything, then are stripping tags. Hint: there won't be any tags anymore after you have escaped them. Next, you should not modify the string anymore after you have SQL escaped it. See The Great Escapism (Or: What You Need To Know To Work With Text Within Text).

deceze
  • 510,633
  • 85
  • 743
  • 889
  • So would you recommend I change the encoding of either my db or page? Also, I have changed the sanitization function to this: `function sanitizeMySQL($val){ $val = htmlentities($val); $val = strip_tags($val); $val = mysql_real_escape_string($val); return $val; }` – sixshift04 Jul 26 '12 at 19:46
  • 1
    Again, the encoding of your *source code* and the encoding of the submitted data appear not to match. And `strip_tags` after `htmlentities` is still nonsense, you probably don't need either. – deceze Jul 26 '12 at 20:36
  • Is it better to use UTF8 throughout the whole chain? – sixshift04 Jul 26 '12 at 21:09
  • My DB has a default of UTF-8_bin now, and the specific columns have been changed for UTF-8, but I still get the same results. – sixshift04 Jul 26 '12 at 23:30
  • 1
    Read both articles linked in my answer. Then read some more of the articles linked from there. – deceze Jul 27 '12 at 05:41