Ok, I wrote a parser just for you. But a few things first.
This is not as trivial as it may first seem. Whenever you allow users to input "stuff" directly into sql you have to be very careful. So this method I use provides some level of sanitation to the data. This is because all the "bits" must match regular expressions to pass through. None of these provide quotes, backslashes or other things useful to sql injection. The only exception to that is the regular expression for encapsulated strings (strings within single quotes).
I must stress however that this is not a guarantee that it's impossible to pass SQL Injection code through it. I say this because I've spent very little time on it and tested it very little. The thing to remember is any part of that query string is susceptible to sql injection, not just the values. If you allow a user to pass in something like this:
"title = 'home' and description = 'this is just an example'"
They could just pass in this:
";DROP DATABASE"
Now there are protections against running multiple queries, but my point is simply doing a string replace or a simple Regx is not enough for this. I also added in a list of "Banned" words. These words cannot be used without enclosing them in single quotes. They are common operations in MySQL that should not be present in the WHERE clause. Some examples would be:
etc... Now because they are not defined in the switch statement within the function parse
they will get picked up by the default
case, which throws an exception.
There are also a lot of variation, I tried covering the most common things. These were not present in you examples. What I mean by this is stuff like so:
"title = 'home' OR title = 'user'"
multiple uses of the same column (with different values)
"title IN('home','user', 'foo', 1, 3)"
IN
"title IS NOT NULL"
NULLs
- Other operations, you only had
=
I included this regx '=|\<|\>|\>=|\<=|\<\>|!=|LIKE'
which should match =
, <
, >
, >=
, <=
, <>
, !=
and LIKE
Now I am sure I missed some, but these should give you some example on how to handle these sort of things. That is one benefit of this method, is that it's pretty strait forward to add a new token in and add some code to handle it. Therefor you can adapt it as the situation warrants.
Because this uses a while
loop it should handle any number of column -> value sets.
So this is what I came up with (based on lexical analysis):
//For debugging
error_reporting(-1);
ini_set('display_errors', 1);
echo "<pre>";
function parse($subject, $tokens)
{
$types = array_keys($tokens);
$patterns = [];
$lexer_stream = [];
$result = false;
foreach ($tokens as $k=>$v){
$patterns[] = "(?P<$k>$v)";
}
$pattern = "/".implode('|', $patterns)."/i";
if (preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE)) {
//print_r($matches);
foreach ($matches[0] as $key => $value) {
$match = [];
foreach ($types as $type) {
$match = $matches[$type][$key];
if (is_array($match) && $match[1] != -1) {
break;
}
}
$tok = [
'content' => $match[0],
'type' => $type,
'offset' => $match[1]
];
$lexer_stream[] = $tok;
}
$result = parseTokens( $lexer_stream );
}
return $result;
}
function parseTokens( array &$lexer_stream ){
$column = '';
$params = [];
$sql = '';
while($current = current($lexer_stream)){
$content = $current['content'];
$type = $current['type'];
switch($type){
case 'T_WHITESPACE':
case 'T_COMPARISON':
case 'T_PAREN_OPEN':
case 'T_PAREN_CLOSE':
case 'T_COMMA':
case 'T_SYMBOL':
$sql .= $content;
next($lexer_stream);
break;
case 'T_COLUMN':
$column = $content;
$sql .= $content;
next($lexer_stream);
break;
case 'T_OPPERATOR':
case 'T_NULL':
$column = '';
$sql .= $content;
next($lexer_stream);
break;
case 'T_ENCAP_STRING':
case 'T_NUMBER':
if(empty($column)){
throw new Exception('Parse error, value without a column name', 2001);
}
$value = trim($content,"'");
$palceholder = createPlaceholder($column, $value, $params);
$params[$palceholder] = $value;
$sql .= $palceholder;
next($lexer_stream);
break;
case 'T_IN':
$sql .= $content;
parseIN($column, $lexer_stream, $sql, $params);
break;
case 'T_EOF': return ['params' => $params, 'sql' => $sql];
case 'T_UNKNOWN':
case '':
default:
$content = htmlentities($content);
print_r($current);
throw new Exception("Unknown token $type value $content", 2000);
}
}
}
function createPlaceholder($column, $value, $params){
$placeholder = ":{$column}";
$i = 1;
while(isset($params[$placeholder])){
if($params[$placeholder] == $value){
break;
}
$placeholder = ":{$column}_{$i}";
++$i;
}
return $placeholder;
}
function parseIN($column, &$lexer_stream, &$sql, &$params){
next($lexer_stream);
while($current = current($lexer_stream)){
$content = $current['content'];
$type = $current['type'];
switch($type){
case 'T_WHITESPACE':
case 'T_COMMA':
$sql .= $content;
next($lexer_stream);
break;
case 'T_ENCAP_STRING':
case 'T_NUMBER':
if(empty($column)){
throw new Exception('Parse error, value without a column name', 2001);
}
$value = trim($content,"'");
$palceholder = createPlaceholder($column, $value, $params);
$params[$palceholder] = $value;
$sql .= $palceholder;
next($lexer_stream);
break;
case 'T_PAREN_CLOSE':
$sql .= $content;
next($lexer_stream);
return;
break;
case 'T_EOL':
throw new Exception("Unclosed call to IN()", 2003);
case 'T_UNKNOWN':
default:
$content = htmlentities($content);
print_r($current);
throw new Exception("Unknown token $type value $content", 2000);
}
}
throw new Exception("Unclosed call to IN()", 2003);
}
/**
* token should be "name" => "regx"
*
* Order is important
*
* @var array $tokens
*/
$tokens = [
'T_WHITESPACE' => '[\r\n\s\t]+',
'T_ENCAP_STRING' => '\'.*?(?<!\\\\)\'',
'T_NUMBER' => '\-?[0-9]+(?:\.[0-9]+)?',
'T_BANNED' => 'SELECT|INSERT|UPDATE|DROP|DELETE|ALTER|SHOW',
'T_COMPARISON' => '=|\<|\>|\>=|\<=|\<\>|!=|LIKE',
'T_OPPERATOR' => 'AND|OR',
'T_NULL' => 'IS NULL|IS NOT NULL',
'T_IN' => 'IN\s?\(',
'T_COLUMN' => '[a-z_]+',
'T_COMMA' => ',',
'T_PAREN_OPEN' => '\(',
'T_PAREN_CLOSE' => '\)',
'T_SYMBOL' => '[`]',
'T_EOF' => '\Z',
'T_UNKNOWN' => '.+?'
];
$tests = [
"title = 'home' and description = 'this is just an example'",
"title = 'home' OR title = 'user'",
"id = 1 or title = 'home'",
"title IN('home','user', 'foo', 1, 3)",
"title IS NOT NULL",
];
/* the loop here is for testing only, obviously call it one time */
foreach ($tests as $test){
print_r(parse($test,$tokens));
echo "\n".str_pad(" $test ", 100, "=", STR_PAD_BOTH)."\n";
}
This outputs:
Array
(
[params] => Array
(
[:title] => home
[:description] => this is just an example
)
[sql] => title = :title and description = :description
)
========== title = 'home' and description = 'this is just an example' ==========
Array
(
[params] => Array
(
[:title] => home
[:title_1] => user
)
[sql] => title = :title OR title = :title_1
)
======================= title = 'home' OR title = 'user' =======================
Array
(
[params] => Array
(
[:id] => 1
[:title] => home
)
[sql] => id = :id or title = :title
)
=========================== id = 1 or title = 'home' ===========================
Array
(
[params] => Array
(
[:title] => home
[:title_1] => user
[:title_2] => foo
[:title_3] => 1
[:title_4] => 3
)
[sql] => title IN(:title,:title_1, :title_2, :title_3, :title_4)
)
===================== title IN('home','user', 'foo', 1, 3) =====================
Array
(
[params] => Array
(
)
[sql] => title IS NOT NULL
)
============================== title IS NOT NULL ===============================
You can test it live here
Hope it works for you!