3

I have a string that I want to explode by commas but only if the comma is not nested inside some parentheses. This is a fairly common use case, and I have been reading through answered posts in this forum, but not really found what I am looking for.

So, in detail: the point is, i have a string (= SQL SELECT ... FROM statement), and I want to extract the elements from the list separated by comma encoded in this string (= the name of the columns that one wants to select from. However, these elements can contain brackets, and effectively be function calls. For example, in SQL one could do

SELECT TO_CHAR(min(shippings.shippingdate), 'YYYY-MM-DD') as shippingdate, nameoftheguy FROM shippings WHERE ...

Obviously, I would like to have an array now containing as first element

TO_CHAR(min(shippings.shippingdate), 'YYYY-MM-DD') as shippingdate

and as second element

nameoftheguy

The approaches I have followed so far are PHP and RegEx: Split a string by commas that are not inside brackets (and also nested brackets), PHP: Split a string by comma(,) but ignoring anything inside square brackets?, Explode string except where surrounded by parentheses?, and PHP: split string on comma, but NOT when between braces or quotes? (focussing on the regex expressions therein, since I would like to do it with a single regex line), but in my little test area, those do not give the proper result. In fact, all of them split nothing or too much:

$Input: SELECT first, second, to_char(my,big,house) as bigly, export(mastermind and others) as aloah FROM
$Output: Array ( [0] => first [1] => second [2] => to_char [3] => (my,big,house) [4] => as [5] => bigly [6] => export [7] => (mastermind and others) [8] => as [9] => aloah )

The code of my test area is

<?php
function test($sql){
    $foo = preg_match("/SELECT(.*?)FROM/", $sql, $match);
    $bar = preg_match_all("/(?:[^(|]|\([^)]*\))+/", $match[1], $list);
    //$bar = preg_match_all("/\((?:[^()]|(?R))+\)|'[^']*'|[^(),\s]+/", $match[1], $list);
    //$bar = preg_match_all("/[,]+(?![^\[]*\])/", $match[1], $list);
    //$bar = preg_match_all("/(?:[^(|]|\([^)]*\))+/", $match[1], $list);
    //$bar = preg_match_all("/[^(,\s]+|\([^)]+\)/", $match[1], $list);
    //$bar = preg_match_all("/([(].*?[)])|(\w)+/", $match[1], $list);
    print "<br/>";
    return $list[0];
}

print_r(test("SELECT first, second, to_char(my,big,house) as bigly, export(mastermind and others) as aloah FROM"));
?>

As you can imagine, I am not an regex expert, but I would like to do this splitting in a single line, if it is possible.

Federico klez Culloca
  • 26,308
  • 17
  • 56
  • 95
conni
  • 67
  • 6
  • 1
    You really should be looking at a proper parser. What will you do if you get a query with a subquery in the select list e.g. `SELECT x, (SELECT y FROM z) AS p, z FROM ...`? Your `$foo = preg_match("/SELECT(.*?)FROM/", $sql, $match);` code will not work properly in that case... – Nick Apr 10 '19 at 12:39
  • Assuming all parenthesis are balanced, you could split using something like [`,\s*(?![^(]*\))`](https://regex101.com/r/ye9hpr/1). – PJProudhon Apr 10 '19 at 12:41
  • Fair enough. In the first line I probably shouldn't do the regex in the way I do now because of what you just explained. But the second line (the one in question) should in this case (provided there is an answer in the way I imagine it) still return the proper list, i.e. ["x", "(SELECT y FROM z) AS p", "z"]. – conni Apr 10 '19 at 12:44
  • @PJProudhon with this I get an error "Warning: preg_match_all(): No ending delimiter ',' found" – conni Apr 10 '19 at 12:50

1 Answers1

0

Following the conversation here, I did write a parser to solve this problem. It is quite ugly, but it does the job (at least within some limitations). For completeness (if anybody else might run into the same question), I post it here:

function full($sqlu){
    $sqlu = strtoupper($sqlu);
    if(strpos($sqlu, "SELECT ")===false || strpos($sqlu, " FROM ")===false) return NULL;
    $def      = substr($sqlu, strpos($sqlu, "SELECT ")+7, strrpos($sqlu, " FROM ")-7);
    $raw      = explode(",", $def);
    $elements = array();
    $rem      = array();
    foreach($raw as $elm){
        array_push($rem, $elm);
        $txt = implode(",", $rem);
        if(substr_count($txt, "(") - substr_count($txt, ")") == 0){
            array_push($elements, $txt);
            $rem = array();
        }
    }
    return $elements;
}

When feeding it with the following string

SELECT first, second, to_char(my,(big, and, fancy),house) as bigly, (SELECT myVar,foo from z) as super, export(mastermind and others) as aloah FROM table

it returns

Array ( [0] => first [1] => second [2] => to_char(my,(big, and, fancy),house) as bigly [3] => (SELECT myVar,foo from z) as super [4] => export(mastermind and others) as aloah ) 
conni
  • 67
  • 6