Anyway to stop receiving a potential multi-byte character string

Question

Per this excellent eye-opener article written by a sec. expert , I become quite suspicious about the incoming strings - due to the fact that mysql_real_escape_string may be tricked...

The problem strictly stems from the multi-byte char sets such as GBK. If the user input is not a multi-byte input, then, no problems, no issues there as the mysql_real_escape_string will be good enough against SQL injection - provided that you do your basic data type validations properly.

I'm not saying multi-byte is evil... but if you do not have to deal with multibyte situations, then don't. Stick to utf-8 if that works for you and just stay in utf-8 al the time... But the question is how? Cause, it's the user who starts the process by sending you a non utf-8 string and perhaps a multi-byte string like GBK...

How do you make sure that you can successfully and reliably reject that user input then? From what I read/learn, it is impossible to know what char set the incoming user string is in. Then what?

In other words, how do you make sure that you are working with utf-8 user strings? I'm asking this because all the PHP filter/sanitization functions are all designed to deal with utf-8 input, they don't know how to deal with multi-bytes? As the article points, protection measurements becomes the cause of the failure.

oooh, please don't just say use prepared statements... aware of that excellent option allready.

score 2 · Accepted Answer · edited May 23 '17 at 12:27

2

This excellent eye-opener article has been written almost a decade ago and become a little obsolete.
Since then things improved a little.
PHP got a function to control mysql_real_escape_string() and make it really "taking into account the current character set of the connection" as documentation says.

The problem strictly stems not from the multi-byte char sets such as GBK but rather from the character set misinterpretation. So, you just have to tell mysql, what character set you are working with. And thus there is no point in detecting multibyte strings at all.

So, just set the proper character set using mysql_set_charset() and you will be safe.

Here is a little demo I wrote on the topic.

Also keep in mind that not every multibyte encoding is vulnerable. utf-8 is pretty safe. Otherwise we were suffering a zillion injections to-day.

edited May 23 '17 at 12:27

Community

1
1

answered Feb 02 '12 at 04:33

Your Common Sense

156,878
40
214
345

Excellent Answer! That made my day... is there any need to also make sure that the php pages to contain this one liner accross the board? `header('Content-type: text/html; charset=utf-8');` or better yet, just do this instead `default_charset = "utf-8"` in the php.ini for once and for all? Are `mysql_set_charset()` and `default_charset = "utf-8"` taking care of different things? Supporting one another? If you got one, you don't need the other? – Average Joe Feb 02 '12 at 22:31
HTTP and SQL are indeed different things. HTTP encoding has nothing to do on the tiopic. You have to set it up, either using header() or ini setting, to make site works properly, but it won't affect SQL security at all – Your Common Sense Feb 04 '12 at 16:13

Anyway to stop receiving a potential multi-byte character string

1 Answers1