1

I've been trying to extract something inside a string. I got the follwing string :

*, bob, DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd')), george

I want to extract by commas outside () and it is suppose to give this :

  • *
  • bob
  • DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd'))
  • george

I've been trying to use explode but it cut even inside ( and ) ... logic by the function mean.

So I've did this : [^(,\s]+|\([^)]+\) but it give cut even if a commas is found inside bracket.

Anyone know how to do what i mean ?

Thanks

EDIT :

Ok to be very clear and direct.

I got this : SELECT MyField, Field2, Blabla, Function(param), etc FROM table Blabla

I got the string MyField, Field2, Blabla, Function(param), etc already because the query is done by multiple function class like $DB->Select('MyField, Field2, Blabla, Function(param), etc'); but now I want to parse everything between commas so MyField, Field2, Blabla, Function(param), etc become this :

  • MyField
  • Field2
  • Blabla
  • Function(param)
  • etc
David Bélanger
  • 7,400
  • 4
  • 37
  • 55
  • 3
    SQL is an *irregular* language; matching/analyzing it with a *regular* expression is the incorrect way to go about this. It's the wrong tool for the job. (That's not to say that it won't work, it'll probably end up biting you in the rear later on, though.) – qJake May 25 '12 at 15:20
  • @SpikeX How you suggest to split a statement then ? SELECT **(this)** FROM – David Bélanger May 25 '12 at 15:21
  • Some form of string parsing logic written in PHP, since with PHP you have much greater control and a much wider selection of parsing tools available at your disposal (things like conditionals, loops, etc). – qJake May 25 '12 at 15:22
  • Can you show an entire sample query that you're trying to parse, and explain why/what data you need from it?? Perhaps there is a better way than RegEx to parse the query (maybe interact with the data BEFORE full query) or maybe there is no way to get what you want. – PenguinCoder May 25 '12 at 15:22
  • I should also add that parsing nested items with Regex is almost impossible... Since you have nested parenthesis in your example, Regex almost certainly won't work here. It's the same reason why you don't parse HTML or BBCode with Regex... parsing something like `A [b]Lazy [i]Fox[/i][/b] Jumps [u]Over [i]The[/i] Dog[/u].` is impossible without the aid of a programming language. – qJake May 25 '12 at 15:24
  • @SpikeX I know... I can use recursion. I just want help that's all. – David Bélanger May 25 '12 at 15:26
  • Googling this for about 5 seconds brought me here: http://code.google.com/p/php-sql-parser/ Would this be of any use? – qJake May 25 '12 at 15:27
  • 1
    @DavidBélanger [This will not *answer* your question](http://stackoverflow.com/questions/139926/regular-expression-to-match-common-sql-syntax), but it is relevant to your interest and I completly agree with the accepted answer. – PenguinCoder May 25 '12 at 15:28
  • After more research I came with that `(?>'[^']*'|"[^"]*"|\([^()]*(((?\()[^()]*)+((?<-open>\))[^()]*)+)*(?(open)(?!))\)|[^'",])+` and it does what I wanted to. – David Bélanger May 25 '12 at 15:35
  • @DavidBélanger: can you post an answer with your regex and `preg_*`? I really want to see how its done. Thanks. – flowfree May 25 '12 at 15:39
  • 1
    @DavidBélanger That Regex is going to come back to haunt you, I promise. You're trying to fasten a screw with a hammer. – qJake May 25 '12 at 15:42

6 Answers6

4

Posting this as an answer since it's probably better than anything else:

http://code.google.com/p/php-sql-parser/

Use that project to parse your SQL statements. The results come back as an array, including the bits in between SELECT and FROM as individual elements, just as you want. This will work far better than any regular expression solution you use.

qJake
  • 16,821
  • 17
  • 83
  • 135
  • I agree this is the **way** to go when parsing an SQL statement. But this is not I was looking for my need. – David Bélanger May 25 '12 at 15:59
  • Why not? You need to parse a SQL string and retrieve the elements of the `SELECT` portion of the string. Why does this not do what you want? – qJake May 25 '12 at 16:00
  • Because this is not what it intend to do. Why use a Civic instead of a Ferrari ? A huge class will take way more horse power then a simple function / regex. I don't need to parse complicated string, only simple as you saw. – David Bélanger May 25 '12 at 16:04
  • I think you're overestimating the overhead associated with that project. If it's all class-based, only the classes you invoke take up memory, not the entire project. – qJake May 25 '12 at 16:06
2

Here's what I cooked up, doesn't support multibyte characters:

Edit: added string awareness

<?php


$stack = array();
$stuff = array();

$escaping = false;
$input = "*, bob, [], DATE('g()d\\'f,gfd', ('Fd()sf)ds'), ('fdsfd\"\"()fsd')), ',(),() (,,'";
$len = strlen( $input );
$i = 0;
$curstr = "";
$char;

while( $i < $len ) {
    $char = $input[$i++];

    if( $escaping ) {
        $curstr .= $char;
        $escaping = false;
        continue;
    }

    switch( $char ) {

        case "\\":
            $escaping = true;
            break;

        case '"':
            $top = end( $stack );
            if( $top === '"' ) {
                array_pop( $stack );
            }
            else if( $top !== "'" ){
                $stack[] = '"';
            }

            $curstr .= $char;
            break;

        case "'":
            $top = end( $stack );
            if( $top === "'" ) {
                array_pop( $stack );
            }
            else if( $top !== '"' ) {
                $stack[] = "'";
            }

            $curstr .= $char;           
            break;

        case ",":
            if( count( $stack ) ) {
                $curstr .= $char;
            }
            else {
                $stuff[] = trim($curstr);
                $curstr = "";
            }
            break;

        case "(":
            $top = end( $stack );
            if( $top !== "'" && $top !== '"' ) {
                $stack[] = "(";                   
            }

            $curstr .= $char;
            break;

        case ")":
            $top = end( $stack );

            if( $top !== "'" && $top !== '"' ) {
                if( end($stack) !== "(" ) {
                    die( "Unbalanced parentheses" );
                }
                array_pop( $stack );
            }

            $curstr .= $char;


            break;

        default:
            $curstr .= $char;
            break;

    }
}

if( count( $stack ) ) {
    die( "Unbalanced ".end($stack) );
}

$stuff[] = trim( $curstr );

print_r( $stuff );

/*
    Array
(
    [0] => *
    [1] => bob
    [2] => []
    [3] => DATE('g()d'f,gfd', ('Fd()sf)ds'), ('fdsfd""()fsd'))
    [4] => ',(),() (,,'
)

*/
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • As with my solution it won't work if there are brackets within quoted data. – diolemo May 25 '12 at 15:57
  • @diolemo what do you mean? brackets are treated as any normal character here (not comma, ( or ) ). Didn't see op wanted them to be treated specially? – Esailija May 25 '12 at 15:59
  • This is what I was not looking for, since I wanted regex. BUT, this is better and THIS is working well. Thanks a lot sir. – David Bélanger May 25 '12 at 16:00
  • Consider this example. `'test(1', 'test2'` We have to allow for brackets within quoted strings (those should not be considered). Take a look at my solution too as it is shorter (but it suffers the same problem). – diolemo May 25 '12 at 16:03
  • But why would I put `'test(1', 'test2'` inside my query ? Start with a quote and one paranthesis ? – David Bélanger May 25 '12 at 16:06
  • @diolemo ah, you mean parentheses. Yes, that's currently not supported but if needed, you can just add cases for `'` and `"` and use the stack for them as well to check if we are inside a string currently. – Esailija May 25 '12 at 16:06
  • 1
    @DavidBélanger it depends whether any part of your query is constructed from user provided data. If not then you are fine to use this or my solution. – diolemo May 25 '12 at 16:08
  • @Esailija Yes and you would also need to consider escape sequences such as `\'` and `\"` (and also consider whether the `\` was itself escaped). It gets messy. – diolemo May 25 '12 at 16:10
  • @diolemo I have added string awareness now in case it's needed. It didn't become much messier imo :P – Esailija May 25 '12 at 16:36
  • @Esailija I didn't mean to suggest that it wasn't possible I was just making clear to the OP that neither of our solutions handled it at present. – diolemo May 25 '12 at 16:38
  • @diolemo nevertheless, it was fun to do :D – Esailija May 25 '12 at 16:39
  • @Esailija You have neglected to include the fact that the slash can itself be escaped. Consider the example `'hello\\\\\\ooo', 'there'`. For this the results should be `hello\\\\\\ooo` and `there`. The quote is NOT escaped as it is the backslashes that are escaped. The key bit of information is that there is an even number of slashes. So yes my point remains. It is messy. You can probably do it in 10-15 lines but it is still messy to even think about. – diolemo May 25 '12 at 16:41
  • @Esailija Nevermind. Still the quotes added quite a few lines :P – diolemo May 25 '12 at 16:47
  • @diolemo True but the way I decided to handle it from beginning (stack) allowed for easy expansion. :P 'hello\\\\\\' is actually 3 backslashes because there is escaping from php when you do a string literal. – Esailija May 25 '12 at 16:48
  • @Esailija You should not allow the escape sequence when not inside a quoted string. Now I am just being picky right? OP has left long time ago... :P – diolemo May 25 '12 at 16:50
  • @diolemo I know, but backslashes outside of strings are not valid SQL in the first place I think – Esailija May 25 '12 at 16:51
  • @Esailija Yes the example is not a PHP string but a raw string. – diolemo May 25 '12 at 16:51
  • @Esailija OK was interesting to go over this with you. Now I have to get going. Bye. – diolemo May 25 '12 at 16:52
0

You stated in your comments that you're prepared to use recursion because you have nested lists. However, regex cannot do recursion. This is because regex cannot "count" anything indefinitely. Since it has no way of counting open/close parenthesis, it can't know how many levels in it is, or how many levels out it must go.

You can write horrendously complex regex to handle N levels of depth (see anubhava's answer), but as soon as you run across an expression with N+1 levels of depth your regex will fail. This is why we use programming languages to parse irregular languages because they can count recursion (see diolemo's answer). Within this recursion, we can use small bits of regex.

Community
  • 1
  • 1
dlras2
  • 8,416
  • 7
  • 51
  • 90
  • Explain this then : http://stackoverflow.com/questions/10624370/how-to-parse-a-line-and-result-in-array-stuck – David Bélanger May 25 '12 at 15:32
  • 2
    @DavidBélanger Notice how little regex there is in that solution. The regex isn't doing any more than character validation - it's not doing any of the parsing. You can use regex in recursion, but not recursion in regex. – dlras2 May 25 '12 at 15:49
0

This will work (for the most part). It will fail if you have brackets within quotes (part of the data). You can extend the code to handle quoted brackets if you want (but then you have to consider escaped quotes and everything like that. A regular expression will never work well.

Edit: Better to use the PHP SQL Parser as answered by SpikeX.

function unreliable_comma_explode($str)
{
   $last_split = 0;
   $len = strlen($str);
   $brackets = 0;
   $parts = array();

   for ($i = 0; $i < $len; $i++)
   {
      if ($str[$i] == '(') 
      {
         $brackets++;
         continue;
      }

      if ($str[$i] == ')')
      {
         if (--$brackets == -1) $brackets = 0;
         continue;
      }

      if ($str[$i] == ',' && $brackets == 0)
      {
         $parts[] = substr($str, $last_split, ($i-$last_split));
         $last_split = $i + 1;
      }
   }

   if (($len-$last_split) > 0)
      $parts[] = substr($str, $last_split, ($len-$last_split));

   return $parts;
}
diolemo
  • 2,621
  • 2
  • 21
  • 28
0

You can use this regex based code to get the split result the way you want:

$str = "*, bob, DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd')), george";
$arr = preg_split('/([^,]*(?:\([^)]*\))[^,]*)+|,/', $str, -1,
                      PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

Update:

Though my original answer worked for the example that OP posted but due the concerns raised by some members I am posting a solution that will work with nested parenthesis as well as long as brackets are balanced:

$str = "*, bob, DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd', ('foo'))) 'foo'=[bar]," .
       "john, MY('gdfgfd', ((('Fdsfds'))), ('fdsfdfsd')), george";
$arr = preg_split('/\s*( [^,()]* \( ( [^()]* | (?R) )* \) [^,()]* ) ,?\s* | \s*,\s*/x',
                  $str, -1 , PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($arr);

OUTPUT:

Array
(
    [0] => *
    [1] => bob
    [2] => DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd', ('foo'))) 'foo'=[bar]
    [3] => john
    [4] => MY('gdfgfd', ((('Fdsfds'))), ('fdsfdfsd'))
    [5] => george
)

Caution: Even though this recursion based regex pattern works with deep nested brackets now it doesn't mean that this cannot be broken for some edge case situations (like unbalanced brackets).

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Breaks with: `"*, bob, DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd')), george"` – diolemo May 25 '12 at 15:51
  • Just be careful - this will break if you have any loops nested further than this. – dlras2 May 25 '12 at 15:52
  • @Downvoter: Please care to explain the down vote? Everybody knows limitation of regex while dealing with nested matching brackets but isn't this code working with the example OP provided. – anubhava May 25 '12 at 15:56
  • I didn't downvote you, but you *are* assisting OP in shooting themself in the foot. At least be very explicit about what the limitations of your answer are. Often the question the OP asks is not the question they truly want answered, and your solution is extremely fragile. – dlras2 May 25 '12 at 16:34
  • 1
    I'm very curious how this actually works... I can't replicate your results. What does `(?R)` do? – dlras2 May 26 '12 at 03:02
  • 1
    @DanRasmussen: Here is the code Working Demo: http://ideone.com/4Eqhj and [here is a great tutorial on Recursive Reex in PHP](http://www.asiteaboutnothing.net/regexp/regex-recursion.html) – anubhava May 26 '12 at 04:21
  • @diolemo: Please check the edited answer now which works with deep nested brackets and as as well with your example string. – anubhava May 26 '12 at 04:40
  • 1
    @anubhava +2 for teaching me something completely new about regex. -1 for shattering my perception of "regular" expressions... – dlras2 May 29 '12 at 18:50
-1

I'm not really sure about what you want to do here.. But if you just want to extract strings. You can just use implode.

$array = array("*", "bob", "DATE('gdfgfd', 'Fdsfds', '(\"fdsfdfsd\"))", "george");
echo $test = implode($array, ",");
robert
  • 33,242
  • 8
  • 53
  • 74
  • He said implode didn't work because he wanted everything inside `DATE(...)` as a single entity. – qJake May 25 '12 at 15:43