-1

I'm trying to write a query builder for PDO prepared statements.

I have a WHERE statement as a string, like;

"title = 'home' and description = 'this is just an example'"
"id = 1 or title = 'home'"
"title = home"
etc...

The WHERE statement could contain user supplied data and needs to be sanitised, from what I've read, using prepared statements is a widely used approach for doing this?

I need to split the where string to create a new string like;

$where = "title = :title AND description = :description";

and an array like;

$params = array(':title' => 'home', :description = 'this is just an example');

The difficulty for me being that I have no idea how many different filters will be passed in the original string.

Any help on how to achieve this would be greatly appreciated.

My function that takes both the above split primitives is below;

function select($table, $fields = array(), $where = "", $params = array(), $limit = '', $fetchStyle = PDO::FETCH_ASSOC) {
    global $dbc, $dbq;

    if (empty($fields)) {
        $fields = "*";
    } else {
        $fields = implode(', ', $fields);
    }

    if (empty($where)) {
        $where = "1";
    }

    if ($limit != '' && is_int($limit)) {
        $limit_include = "LIMIT $limit";
    }

    //create query
    $query = "SELECT $fields FROM $table WHERE $where $limit_include";

    //prepare statement
    $dbq = $dbc->query($query);
    $dbq->execute($params);

    return $dbq->fetchAll($fetchStyle);
}
Jamie
  • 159
  • 3
  • 9
  • Personally I would make a lexer/parser sort of setup for it. You can see an example of this method on my github https://github.com/ArtisticPhoenix/MISC/blob/master/JasonDecoder.php this was done for parsing Jason Objects, ( Jason missing the quotes ) The same method can be used for parsing any string, you just need to build the tokens and parse them. I can set one up for you but it will take a minute, and even then you'll likly have to make changes for things like functions `WHERE DATE(create_date) = ` etc. – ArtisticPhoenix Jan 01 '18 at 03:50
  • 1
    Ok, here is an empty version. I don't mind making one for you but it is new years so I might not finish it tell next year.... https://github.com/ArtisticPhoenix/MISC/blob/master/emptyLexer.php – ArtisticPhoenix Jan 01 '18 at 04:03
  • thanks @ArtisticPhoenix, i'll check this out – Jamie Jan 01 '18 at 04:24
  • I need to pass the `select` function a - string containing the constructed WHERE clause for use in the SQL template, so for example; `'title = :title AND description = :description"` the value ':title' and ':description' are placeholders using the sql template and need to be constructed, and - an assoc array where the key is the generated placeholder and value is the value from the original string, e.g. `array(':title' => 'home',...)` I hope this makes sense? – Jamie Jan 01 '18 at 05:06
  • This example string is incorrect `"title = home"` or is home a column. – ArtisticPhoenix Jan 01 '18 at 06:00
  • title is the column, ‘home’ is the value to match (in this case) – Jamie Jan 01 '18 at 07:01
  • Right, but it's not single quoted. This would indicate that it's a column name and not a string if it was SQL. – ArtisticPhoenix Jan 01 '18 at 07:15
  • Just to put it straight: this is NOT a query builder but unusable franknstein that you will abandon after much struggle. I pity the time you are going to waste – Your Common Sense Jan 01 '18 at 11:19
  • Not to mention that a black list offered to you in the answer is anything but safe approach. – Your Common Sense Jan 01 '18 at 11:25
  • Super constructive! If only I can get the time back I wasted reading your comment. – Jamie Jan 01 '18 at 11:26
  • May be it will pay back, helping you to realize the dead end you are going to put yourself in a bit sooner. – Your Common Sense Jan 01 '18 at 11:32
  • Again, thanks heaps for your really constructive feedback, thoughts on how to improve and tackle this better and stellar attitude. You make this community a better place! – Jamie Jan 01 '18 at 11:42

1 Answers1

1

Ok, I wrote a parser just for you. But a few things first.

This is not as trivial as it may first seem. Whenever you allow users to input "stuff" directly into sql you have to be very careful. So this method I use provides some level of sanitation to the data. This is because all the "bits" must match regular expressions to pass through. None of these provide quotes, backslashes or other things useful to sql injection. The only exception to that is the regular expression for encapsulated strings (strings within single quotes).

I must stress however that this is not a guarantee that it's impossible to pass SQL Injection code through it. I say this because I've spent very little time on it and tested it very little. The thing to remember is any part of that query string is susceptible to sql injection, not just the values. If you allow a user to pass in something like this:

   "title = 'home' and description = 'this is just an example'"

They could just pass in this:

   ";DROP DATABASE"

Now there are protections against running multiple queries, but my point is simply doing a string replace or a simple Regx is not enough for this. I also added in a list of "Banned" words. These words cannot be used without enclosing them in single quotes. They are common operations in MySQL that should not be present in the WHERE clause. Some examples would be:

  • DROP
  • DELETE
  • SHOW
  • ALTER

etc... Now because they are not defined in the switch statement within the function parse they will get picked up by the default case, which throws an exception.

There are also a lot of variation, I tried covering the most common things. These were not present in you examples. What I mean by this is stuff like so:

  • "title = 'home' OR title = 'user'" multiple uses of the same column (with different values)
  • "title IN('home','user', 'foo', 1, 3)" IN
  • "title IS NOT NULL" NULLs
  • Other operations, you only had = I included this regx '=|\<|\>|\>=|\<=|\<\>|!=|LIKE' which should match =, <, >, >=, <=, <>, != and LIKE

Now I am sure I missed some, but these should give you some example on how to handle these sort of things. That is one benefit of this method, is that it's pretty strait forward to add a new token in and add some code to handle it. Therefor you can adapt it as the situation warrants.

Because this uses a while loop it should handle any number of column -> value sets.

So this is what I came up with (based on lexical analysis):

//For debugging
error_reporting(-1);
ini_set('display_errors', 1);
echo "<pre>";

function parse($subject, $tokens)
{
    $types = array_keys($tokens);
    $patterns = [];
    $lexer_stream = [];
    $result = false;
    foreach ($tokens as $k=>$v){
        $patterns[] = "(?P<$k>$v)";
    }
    $pattern = "/".implode('|', $patterns)."/i";
    if (preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE)) {
        //print_r($matches);
        foreach ($matches[0] as $key => $value) {
            $match = [];
            foreach ($types as $type) {
                $match = $matches[$type][$key];
                if (is_array($match) && $match[1] != -1) {
                    break;
                }
            }
            $tok  = [
                'content' => $match[0],
                'type' => $type,
                'offset' => $match[1]
            ];
            $lexer_stream[] = $tok;
        }
        $result = parseTokens( $lexer_stream );
    }
    return $result;
}
function parseTokens( array &$lexer_stream ){

    $column = '';
    $params = [];
    $sql = '';

    while($current = current($lexer_stream)){
        $content = $current['content'];
        $type = $current['type'];
        switch($type){
            case 'T_WHITESPACE':
            case 'T_COMPARISON':
            case 'T_PAREN_OPEN':
            case 'T_PAREN_CLOSE':
            case 'T_COMMA':
            case 'T_SYMBOL':
                $sql .= $content;
                next($lexer_stream);
            break;
            case 'T_COLUMN':
                $column = $content;
                $sql .= $content;
                next($lexer_stream);
            break;
            case 'T_OPPERATOR':
            case 'T_NULL':
                $column = '';
                $sql .= $content;
                next($lexer_stream);
            break;
            case 'T_ENCAP_STRING': 
            case 'T_NUMBER':
                if(empty($column)){
                    throw new Exception('Parse error, value without a column name', 2001);
                }

                $value = trim($content,"'");

                $palceholder = createPlaceholder($column, $value, $params);

                $params[$palceholder] = $value;
                $sql .= $palceholder;
                next($lexer_stream);
            break;
            case 'T_IN':
                $sql .= $content;
                parseIN($column, $lexer_stream, $sql, $params);
            break;
            case 'T_EOF': return ['params' => $params, 'sql' => $sql];

            case 'T_UNKNOWN':
            case '':
            default:
                $content = htmlentities($content);
                print_r($current);
                throw new Exception("Unknown token $type value $content", 2000);
        }
    }
}

function createPlaceholder($column, $value, $params){
    $placeholder = ":{$column}";

    $i = 1;
    while(isset($params[$placeholder])){

        if($params[$placeholder] == $value){
            break;
        }

        $placeholder = ":{$column}_{$i}";
        ++$i;
    }

    return $placeholder;
}

function parseIN($column, &$lexer_stream, &$sql, &$params){
    next($lexer_stream);

    while($current = current($lexer_stream)){
        $content = $current['content'];
        $type = $current['type'];
        switch($type){
            case 'T_WHITESPACE':
            case 'T_COMMA':
                $sql .= $content;
                next($lexer_stream);
            break; 
            case 'T_ENCAP_STRING':
            case 'T_NUMBER':
                if(empty($column)){
                    throw new Exception('Parse error, value without a column name', 2001);
                }

                $value = trim($content,"'");

                $palceholder = createPlaceholder($column, $value, $params);

                $params[$palceholder] = $value;
                $sql .= $palceholder;
                next($lexer_stream);
            break;    
            case 'T_PAREN_CLOSE':
                $sql .= $content;
                next($lexer_stream);
                return;
            break;          
            case 'T_EOL':
                throw new Exception("Unclosed call to IN()", 2003);

            case 'T_UNKNOWN':
            default:
                $content = htmlentities($content);
                print_r($current);
                throw new Exception("Unknown token $type value $content", 2000);
        }
    }
    throw new Exception("Unclosed call to IN()", 2003);
}

/**
 * token should be "name" => "regx"
 * 
 * Order is important
 * 
 * @var array $tokens
 */
$tokens = [
    'T_WHITESPACE'      => '[\r\n\s\t]+',
    'T_ENCAP_STRING'    => '\'.*?(?<!\\\\)\'',
    'T_NUMBER'          => '\-?[0-9]+(?:\.[0-9]+)?',
    'T_BANNED'          => 'SELECT|INSERT|UPDATE|DROP|DELETE|ALTER|SHOW',
    'T_COMPARISON'      => '=|\<|\>|\>=|\<=|\<\>|!=|LIKE',
    'T_OPPERATOR'       => 'AND|OR',
    'T_NULL'            => 'IS NULL|IS NOT NULL',
    'T_IN'              => 'IN\s?\(',
    'T_COLUMN'          => '[a-z_]+',
    'T_COMMA'           => ',',
    'T_PAREN_OPEN'      => '\(',
    'T_PAREN_CLOSE'      => '\)',
    'T_SYMBOL'          => '[`]',
    'T_EOF'             => '\Z',
    'T_UNKNOWN'         => '.+?'
];

$tests = [
    "title = 'home' and description = 'this is just an example'",
    "title = 'home' OR title = 'user'",
    "id = 1 or title = 'home'",
    "title IN('home','user', 'foo', 1, 3)",
    "title IS NOT NULL",
];

/* the loop here is for testing only, obviously call it one time */
foreach ($tests as $test){   
    print_r(parse($test,$tokens));
    echo "\n".str_pad(" $test ", 100, "=", STR_PAD_BOTH)."\n";  
}

This outputs:

Array
(
    [params] => Array
        (
            [:title] => home
            [:description] => this is just an example
        )

    [sql] => title = :title and description = :description
)

========== title = 'home' and description = 'this is just an example' ==========
Array
(
    [params] => Array
        (
            [:title] => home
            [:title_1] => user
        )

    [sql] => title = :title OR title = :title_1
)

======================= title = 'home' OR title = 'user' =======================
Array
(
    [params] => Array
        (
            [:id] => 1
            [:title] => home
        )

    [sql] => id = :id or title = :title
)

=========================== id = 1 or title = 'home' ===========================
Array
(
    [params] => Array
        (
            [:title] => home
            [:title_1] => user
            [:title_2] => foo
            [:title_3] => 1
            [:title_4] => 3
        )

    [sql] => title IN(:title,:title_1, :title_2, :title_3, :title_4)
)

===================== title IN('home','user', 'foo', 1, 3) =====================
Array
(
    [params] => Array
        (
        )

    [sql] => title IS NOT NULL
)

============================== title IS NOT NULL ===============================

You can test it live here

Hope it works for you!

ArtisticPhoenix
  • 21,464
  • 2
  • 24
  • 38
  • this is an amazing response. I'm going to read through your code and once I understand it i'll drop it in and make it work. This was definitely not a simple question, in fact for me it was quiet complex. I think I can learn a great deal form this code and i really appreciate the time and effort you've taken to write a detailed explanation but also extra thought put in to think more broadly about my question. – Jamie Jan 01 '18 at 07:51
  • also, just for your info, since the primitives this generates are used in prepared statements, the SQL is precompiled and the values should not affect the SQL (i.e. `";DROP DATABASE"` shouldn't be an issue as the value is inserted after the SQL is compiled, however it's fantastic you've thought about these things. The project i'm working on is a small personal project just for learning, but your response will no doubt teach me much more than I had intended in my original question, thank you again! – Jamie Jan 01 '18 at 07:54
  • See that's where you are wrong, that is not a value, as you are allowing them to put the column names and the comparison operators, in other words the whole Where clause, not just the "primitives" as you call them. Any part of a query is susceptible to Injection, if you allow users to supply it. – ArtisticPhoenix Jan 01 '18 at 09:11
  • Thanks for the comment, I did some further reading and it seems PDO prepared statements can help avoid some simple injection attacks but it would seem further steps need to be taken for complex or second level attacks. I found the [following SO](https://stackoverflow.com/questions/134099/are-pdo-prepared-statements-sufficient-to-prevent-sql-injection) article really interesting. – Jamie Jan 01 '18 at 10:40
  • @Jamie What is written in that answer you've found is a complete rubbish. [What a second order SQL injection really is](https://phpdelusions.net/top#second_order_sql_injection) – Your Common Sense Jan 01 '18 at 12:24
  • Thanks for providing the link, i'm here to further my knowledge and learn. That explanation is along the lines of my previous thinking on PDO prepared statements. I found [(The only proper) PDO tutorial](https://phpdelusions.net/pdo) on the same site very helpful, so far. thank you. – Jamie Jan 01 '18 at 12:46