8

Reading some of the SQL injection questions and answers on SO, I saw this answer that suggests you can convert untrusted user input to hex, which will by its very nature not require escaping of any kind, and thus completely and utterly avoid the possibility of SQL injection.

What database abstraction layer you use (PDO, mysqli, mysql, Pear DB, etc.) does not matter.

Example of a normal working query:

$DBH = new PDO('mysql:host=127.0.0.1;dbname=test', 'test', 'testpassword');

// could have been:
//$bookTitle = bin2hex($_GET['title']);
$bookTitle = bin2hex('Catch-22');

$query = "SELECT * from `books` WHERE `title` = UNHEX('$bookTitle')";

foreach ($DBH->query($query) as $row) {
    echo "\n<br />\n";
    print_r($row);
    echo "\n<br />\n";
}

I've included enough code here for you to quickly run a test if you have a database with a table such as:

CREATE TABLE `books` (`id` INT, `title` VARCHAR(100), `author` VARCHAR(100)) ENGINE=InnoDB CHARACTER SET='utf8';
INSERT INTO `books` VALUES(1, 'Catch-22', 'Joseph Heller');

Of course this is oversimplified - you'd normally have input validation, output sanitizing and lots of other abstractions but we want to focus on the question - no more fluff than necessary to help provide easily executed examples.

What I'd like to know is if there are any technical weaknesses to this technique. I am specifically not asking if this technique has human weaknesses (easier to futz as a sloppy programmer, since it is clearly not as clean as using parameterized queries).

Yes, we can all agree that parameterized queries are less susceptible to bad programming or unfortunate oversights. So please stick to the question - does this technique help avoid SQL injections of all kinds, unconditionally?

Can someone show an example of user input that would break this technique? Even a corner case, some particular MySQL server settings or old PHP version that breaks it?

It works just as well for integer replacements:

// could have been:
//$bookID = bin2hex($_GET['id']);
$bookID = bin2hex(1);

$query = "SELECT * from `books` WHERE `id` = UNHEX('$bookID')";

Other thoughts:

  • With this technique you avoid two round trips to the database as happens if using (non-emulated) prepared statements (although prepared statements != parameterized queries).

  • The caveats to some other suggested techniques start to make one go cross-eyed what with the corner case exceptions such as this and this. Does the hex encoding technique avoid all possibility of attackers wreaking havoc using character set encoding tricks and whatever else is out there?

  • It appears that decoding on the database side may be limited to MySQL/MariaDB, although there may be third-party solutions for adding UNHEX() to PostgreSQL (or - not sure - in some databases, you might be able to use the other method of placing the hex literals in the query without using any UNHEX function)

halfer
  • 19,824
  • 17
  • 99
  • 186
messy
  • 224
  • 2
  • 10
  • 2
    As long as you are mixing user input with the query structure, you are one missed input away from a big problem. Whether you use `mysql_real_escape_string` or hex, you still need to protect all input, which prepared statements do automatically. – DCoder Jun 29 '15 at 07:35
  • @DCoder but see links that show mysql_real_escape_string has weaknesses this technique does not. However, point taken on it being easier to make mistakes.@N.B. really? – messy Dec 08 '16 at 11:40
  • 2
    It is correct that `mysql_real_escape_string` has certain weaknesses that this technique does not. However, it does not change the fact that prepared statements are easier to safeguard. In addition to that, prepared statements can also be faster because the DB can reuse a query plan instead of parsing each one from scratch. – DCoder Dec 08 '16 at 18:47

2 Answers2

1

We use this hex-encoding technique on an ecommerce system that has been live for a few years.

I've yet to come across a disadvantage and, once you have an object written for every table in the database (with the corresponding retrieve & store functions), it's all so hidden from the UI developers, they really don't even notice.

This one technique works for

  1. blocking SQL-Injection
  2. supports any client's character encoding
  3. storing binary data

Its also very fast (we had considered base64 encoding, which would produce shorter SQL strings, but require more CPU). In tests we saw no measurable performance difference between hex-encoding all string input and not.

does this technique help avoid SQL injections of all kinds, unconditionally?

I'm yet to see how anybody could SQLi if this technique was always used (when binding any external data into an SQL statement) -

As is common to all techniques (DCoder) - it has to be universally used to provide universal protection - mysql_real_escape_string also suffers this.

user "N.B" rants, but provides no technical argument (may be he was talking about PokemonGo vs CSGO, who knows?), and the only answer clearly misunderstands the technique.

However, I do find the lack of people using / documenting this technique highly surprising - this is the only reason I have any doubt about it.

It's so simply, easy, comprehensive and provides a range of other useful benefits, it's hard to get my head round why everybody isn't doing it and SQLi is a thing of the past???

  • The reason is ridiculously simple: this method is no better than conventional string formatting and worse than binding. So why bother with "using / documenting" at all? – Your Common Sense Sep 06 '16 at 14:21
  • @YourCommonSense only "worse" insofar as easier to make mistakes when writing code. What you didn't mention is that it's technically better than mysql_real_escape_string (see links in question) and as far as I can see, technically equal to using parameterized queries. – messy Dec 08 '16 at 11:37
  • @YourCommonSense I "understands" it fine. Instead of stating opinions without being constructive, can you please run the simple code provided in the question and post an answer with nothing more than input that proves you can break it in a way that parameterized queries won't break? I'll select it as the answer, and we'll all be edified. – messy Dec 08 '16 at 18:45
  • For those wondering what James is referring to, N.B. had made a comment labelling the technique as a "square" that I was trying to force into a "circle" when we already had a wheel. They also labelled the question as programmer comedy, but in extremely rude language, so it was reported and removed. – messy Dec 08 '16 at 18:49
  • @messy prepared statemens treat values internally, while your petty formatting is an external measure. It's just a more toilsome and ugly version of regular quoting/escaping: you take a string, you format it somehow, you put it in the query. While prepared statement is a completely different technology. – Your Common Sense Dec 08 '16 at 19:03
  • @YourCommonSense Why do you have to be so condescending? Please be polite. The question has nothing to do with difficulty of writing code, which is the only thing I can decipher from your opinions. The question asks if the techniques have security weaknesses compared with one another. As noted from links provided, it doesn't suffer from problems `mysql_escape..` has and seems equal to parameterized queries. I'll take it from your lack of engaging the question itself that you merely don't like this particular method. More than one way to skin a cat my friend, please try not to be so judgemental – messy Dec 08 '16 at 23:27
-3

Seems logical since all the data, even the user input that could cause injections, would be converted. I've been trying to find ways of bypassing it but I can't. So far the biggest disadvantages I see are: 1) From a database view you won't be able to make sense of the user's input. You'll have pull everything from the DB and convert it just to see it or manipulate it. 2) Whatever input you provide, the string length will get doubled after conversion.

  • 2
    Have you really been testing it? - Because your comments indicate that you haven't noticed that the data in the tables is NOT in hex. The hex is only a transfer encoding in this scheme. – messy Jun 30 '15 at 20:38