40

On an older server I'm using that I can't use prepared statements on I am currently trying to fully escape user input before sending it to MySQL. For this I am using the PHP function mysql_real_escape_string.

Since this function does not escape the MySQL wildcards % and _ I am using addcslashes to escape these as well.

When I send something like:

test_test " ' 

to the database and then read it back the database shows:

test\_test " ' 

Looking at this I can't understand why the _ has a preceding backslash but the " and ' don't. Since they are all escaped with \ surely _ ' and " should all appear the same, i.e. all have the escape character visible or all not have it visible.

Are the escaping \s automatically screened out for

Can anyone explain this?

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
Columbo
  • 2,896
  • 7
  • 44
  • 54

2 Answers2

95

_ and % are not wildcards in MySQL in general, and should not be escaped for the purposes of putting them into normal string literals. mysql_real_escape_string is correct and sufficient for this purpose. addcslashes should not be used.

_ and % are special solely in the context of LIKE-matching. When you want to prepare strings for literal use in a LIKE statement, so that 100% matches one-hundred-percent and not just any string starting with a hundred, you have two levels of escaping to worry about.

The first is LIKE escaping. LIKE handling takes place entirely inside SQL, and if you want to turn a literal string into an literal LIKE expression you must perform this step even if you are using parameterised queries!

In this scheme, _ and % are special and must be escaped. The escape character must also be escaped. According to ANSI SQL, characters other than these must not be escaped: \' would be wrong. (Though MySQL will typically let you get away with it.)

Having done this, you proceed to the second level of escaping, which is plain old string literal escaping. This takes place outside of SQL, creating SQL, so must be done after the LIKE escaping step. For MySQL, this is mysql_real_escape_string as before; for other databases there will be a different function, of you can just use parameterised queries to avoid having to do it.

The problem that leads to confusion here is that in MySQL uses a backslash as an escape character for both of the nested escaping steps! So if you wanted to match a string against a literal percent sign you would have to double-backslash-escape and say LIKE 'something\\%'. Or, if that's in a PHP " literal which also uses backslash escaping, "LIKE 'something\\\\%'". Argh!

This is incorrect according to ANSI SQL, which says that: in string literals backslashes mean literal backslashes and the way to escape a single quote is ''; in LIKE expressions there is no escape character at all by default.

So if you want to LIKE-escape in a portable way, you should override the default (wrong) behaviour and specify your own escape character, using the LIKE ... ESCAPE ... construct. For sanity, we'll choose something other than the damn backslash!

function like($s, $e) {
    return str_replace(array($e, '_', '%'), array($e.$e, $e.'_', $e.'%'), $s);
}

$escapedname= mysql_real_escape_string(like($name, '='));
$query= "... WHERE name LIKE '%$escapedname%' ESCAPE '=' AND ...";

or with parameters (eg. in PDO):

$q= $db->prepare("... WHERE name LIKE ? ESCAPE '=' AND ...");
$q->bindValue(1, '%'.like($name, '=').'%', PDO::PARAM_STR);

(If you want more portability party time, you can also have fun trying to account for MS SQL Server and Sybase, where the [ character is also, incorrectly, special in a LIKE statement and has to be escaped. argh.)

bobince
  • 528,062
  • 107
  • 651
  • 834
  • 6
    I would +1 again for "the damn backslash!". – BoltClock Sep 10 '10 at 10:30
  • Thanks, just absorbing this now...this is really helping me expand my basic knowledge. Stupidly, I was escaping % and _ even though I'm not actually using any LIKE statements and since I think (please confirm) that % and _ are only wild in the context of a LIKE statement, I am in fact wasting my time. But then that makes me think why would you ever want to escape a % or _ when it's in the context of a LIKE statement. Surely the only reason to use a LIKE statement is so you can use it's wild characters. (please excuse my limited knowledge on this) – Columbo Sep 10 '10 at 11:06
  • 3
    Sure, but it's perfectly natural to want to be able to search for a literal `%` or `_` character. If a user searches for `50%` in the front end, they probably mean they're looking for a string containing `50%` and not just any string with `50` in it. – bobince Sep 10 '10 at 11:08
  • Sorry, yes, of course, I see now. I will keep a bookmark on this post. Thanks for your help. – Columbo Sep 10 '10 at 11:14
  • 1
    I can't edit the answer, but there is a small bug: In str_replace() line: $e_ is a variable that doesn't exist. Instead, use "{$e}_" – Michael Butler Jul 08 '11 at 14:28
  • @Michael: good point, changed to use concatenation for clarity. – bobince Jul 09 '11 at 14:28
  • This is such a great answer, great explanation and great workaround. But reading http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html#operator_like I feel I miss the part in which they talk about quadruple backslashes. As far as I understand, this double-level escaping is **another difference** we have to take in mind when working with LIKE and not a simple string... Am I wrong? @bobince, master, can you correct me? – Áxel Costas Pena Nov 28 '12 at 15:59
  • It's the same two levels of escaping as in this answer - to LIKE-match a literal backslash when your escape character is a backslash, it's two backslashes; to include the LIKE query in a MySQL string literal, that's a second layer of encoding each backslash, resulting in four. This confusion is a good reason to (a) always specify a non-backslash `ESCAPE`, and (b) always use parameterised queries so you don't have to worry about the SQL escaping layer. – bobince Nov 29 '12 at 11:50
  • 2
    Also watch out for https://bugs.mysql.com/bug.php?id=39808 when using eg utf8mb4 language specific collations (see list here https://hastebin.com/acoqedajij). – Bell Feb 21 '17 at 16:07
  • @Bell ... you know, I just spent the day trying to figure out why `Field LIKE '%!!%' ESCAPE '!'` was returning zero results (skipping all that had a literal `!` in them). Read that link you posted, and yup, described the issue exactly. Here, in 2020 ... bug still exists. NOT using a custom escape works fine though... but then back to backslash hell. – IncredibleHat Jul 30 '20 at 16:51
5

Surprised no one bothered to mention it after all these years, but if you don't need to do complex wildcard matching (e.g. foo%baz), I think INSTR/LOCATE/POSITION, LEFT, RIGHT, etc. should suffice. In all of my cases, I only used LIKE to match anywhere in a string (that is, for example %foobar%), so after all the horror stories about escaping LIKE patterns, I'm now using INSTR instead.

Equivalent of value LIKE '%foobar%' (match anywhere):

INSTR(value, 'foobar') > 0

Equivalent of value LIKE 'foobar%' (match at start):

INSTR(value, 'foobar') = 1

Equivalent of value LIKE '%foobar' (match at end):

RIGHT(value, 6) = 'foobar'

It might not be as straight-forward and easy to remember, and the solution for matching at the end could perhaps be improved somehow to be more universal. But these alternatives should hopefully at least give you some peace of mind in terms of security as it bypasses the need for any self-rolled escaping, and doesn't require you to alter the actual parameter values (when using prepared statements anyway).

user966939
  • 692
  • 8
  • 27