2

I need to retrieve table parent-child relationship from "WHERE" clause like this:

select ... large list of fields with aliases ...
from ... list of joined tables ...

where ((`db_name`.`catalog`.`group` = `db_name`.`catalog_group`.`iden`) 
and (`db_name`.`catalog`.`iden` = `db_name`.`catalog_sub`.`parent`))

Is there a some regex to get identifiers from each condition? Say in an array element[0] = table from the left side, element[1] is table from right. Ident's name may be any. So only sql operators like 'where' 'and' '=' may be keys.

Any help would be greatly appreciated.

CLARIFY

I dont want to get references from WHERE clause by WHERE clause. I just want references as such. So as could I see there may be regex to replace all sequences

`.` 

to

. 

and then match all backticked pairs by

` @ ` = ` @ `

Backticks around identifier always present in any may query by default. All string values surrounded by double quotes by default. I thought it's not a complex task for regex guru. Thanks in advance.

PS It's because myISAM engine does not support references I forced to restore in manually.

ENDED with:

public function initRef($q) {

    $s = strtolower($q);
    // remove all string values within double quotes
    $s = preg_replace('|"(\w+)"|', '', $q); 
    // split by 'where' clause
    $arr = explode('where', $s); 
    if (isset($arr[1])) { 
        // remove all spaces and parenthesis
        $s = preg_replace('/\s|\(|\}/', '', $arr[1]); 
        // replace `.` with .
        $s = preg_replace('/(`\.`)/', '.', $s);
        // replace `=` with =           
        $s = preg_replace("/(`=`)/", "=", $s); 
         // match pairs within ticks
        preg_match_all('/`.*?`/', $s, $matches);
        // recreate arr
        $arr = array();
        foreach($matches[0] as &$match) {
            $match = preg_replace('/`/', '', $match); // now remove all backticks
            $match = str_replace($this->db . '.', '', $match); // remove db_name
            $arr[] = explode('=', $match); // split by = sign
        }
        $this->pairs = $arr; 
    } else {
        $this->pairs = 0;
    }

}
kostya
  • 209
  • 3
  • 11
  • 1
    There perhaps is a regex (but it then is really advanced and perhaps hard to deal with), but there are parsers for SQL already. Like for example [PHP SQL Parser](https://code.google.com/p/php-sql-parser/) (no preference) – hakre Aug 20 '13 at 14:57
  • 1
    Regular expressions are not a magic wand you wave at every problem that happens to involve strings. – Andy Lester Aug 20 '13 at 15:46

1 Answers1

2

Using a regular expression seems like it won't help you. What if there are subqueries? What if your query contains a string with the text "WHERE" in it? Hakre mentioned it in a comment above, but your best bet really is using something that can actually interpret your SQL so that you can find what really is a proper WHERE clause and what is not.

If you insist on doing this the "wrong" way instead of by using some context aware parser, you would have to find the WHERE clause, for instance like this:

$parts = explode('WHERE', $query);

Assuming there is only one WHERE clause in your query, $parts[1] will then contain everything from the WHERE onwards. After that you would have to detect all valid clauses like ORDER BY, GROUP BY, LIMIT, etc. that could follow, and break off your string there. Something like this:

$parts = preg_split("/(GROUP BY|ORDER BY|LIMIT)|/", $parts[1]);
$where = $parts[0];

You would have to check the documentation for your flavor of SQL and the types of queries (SELECT, INSERT, UPDATE, etc.) you want to support for a full list of keywords that you want to split on.

After that, it would probably help to remove all brackets because precedence is not relevant for your problem and they make it harder to parse.

$where = preg_replace("/[()]/", "", $where);

From that point onward, you'd have to split again to find all the separate conditions:

$conditions = preg_split("/(AND|OR|XOR)/", $where);

And lastly, you'd have to split on operators to get the right and left values:

foreach ($conditions as $c)
{
    $idents = preg_split("/(<>|=|>|<|IS|IS NOT)/");
}

You would have to check that list of operators and add to it if needed. $idents now has all possible identifiers in it.

You may want to note that several of these steps (at the very least the last step) will also require trimming of the string to work properly.

Disclaimer: again, I think this is a very bad idea. This code only works if there is only one WHERE clause and even then it depends on a lot of assumptions. A complicated query will probably break this code. Please use a SQL parser/interpreter instead.

  • Thanks for reply. I will take all your suggestion into accout. – kostya Aug 21 '13 at 02:39
  • Ok, will try step-by-step regex in human words. Regex must 1 replace or ignore all strings within double quotes. 2. Separate part of query behind where clause. 4. Replace all backtick dot backtick seq to dot. 5. Get pairs of ident by backtick equal backtick sequence. – kostya Aug 21 '13 at 03:38
  • If you can be certain that every identifier your want to match is encased in backticks, things get a bit simpler. You'd still not want to use regular expressions in that case though, use a stack based parser instead. That may sound complicated but this one only needs to have 3 modes: either you're reading an identifier, or you're not and in case you're not, you only need to be aware of strings in case they contain backticks. – Jeroen van den Broek Aug 21 '13 at 10:49