How SQL queries are parsed is dependent on the connection character set. If you did this query:
$value = chr(0xE0) . chr(0x5C);
mysql_query("SELECT '$value'");
then if the connection character set was Latin-1 MySQL would see the invalid:
SELECT 'à\'
whereas if the character set were Shift-JIS, the byte sequence 0xE0,0x5C would be interpreted as a double-byte character:
SELECT '濬'
Add string literal escaping for security:
$value = mysql_real_escape_string($value);
mysql_query("SELECT '$value'");
Now if you've correctly set the connection character set to Shift-JIS with mysql_set_charset
, MySQL still sees:
SELECT '濬'
But if you haven't set the connection character set, and MySQL's default character set is Shift-JIS but PHP's default character set is ASCII, PHP doesn't know that the trailing 0x5C character is part of a double-byte sequence, and escapes it, thinking it is generating the valid output:
SELECT 'à\\'
whilst MySQL reads it using Shift-JIS as:
SELECT '濬\'
With the trailing '
escaped with a backslash, this has left the string literal open. The next '
character in the query will end the string, leaving whatever follows in raw SQL content. If you can inject there, the query is vulnerable.
This problem only applies to a few East Asian encodings like Shift-JIS where multibyte sequences can contain bytes which on their own are valid ASCII characters like the backslash. If the mismatched encodings both treat low bytes as always-ASCII (strict ASCII supersets like the more-common mismatch of Latin-1 vs UTF-8), no such confusion is possible.
Luckily servers which default to these encodings are uncommon, so in practice this is a rarely-exploitable issue. But if you have to use mysql_real_escape_string
you should do it right. (Better to avoid it completely by using parameterised queries though.)