2

Possible Duplicate:
Simulate php array language construct or parse with regexp?

suppose I have the string

$str = "array(1,3,4),array(array(4,5,6)),'this is a comma , inside a string',array('asdf' => 'lalal')";

and I try to explode this into an array by comma so that the desired end result is

$explode[0] =  array(1,3,4);
$explode[1] = array(array(4,5,6));
$explode[2] = 'this is a comma , inside a string';
$explode[3] = array('asdf' => 'lalal');

simply calling explode(',',$str) is not going to cut it since there are also commas within those chunks...

is there a way to explode this reliably even if there is commas inside the desired chunks

Community
  • 1
  • 1
kamikaze_pilot
  • 14,304
  • 35
  • 111
  • 171
  • Where do you get that string from? If you are generating that string and storing somwhere (and now you want to parse it), then i would suggest you better use [JSON format](http://www.php.net/manual/en/function.json-decode.php) as t would be easyer to parse it afterwards. – Janis Veinbergs Oct 24 '11 at 09:14
  • it's generated by an exteremely complicated but crucial third party function...if you can tell me how to automatically convert that string format into json format it'll also be appreciated – kamikaze_pilot Oct 24 '11 at 09:16
  • The problem is caused by your ambiguous delimiter. Are you able to change the outmost delimiter to another value than ','? This would be the easiest solution. Another one could be a regular expression but depending on the possible values of your string this regex could be really difficult to create. – TRD Oct 24 '11 at 09:18
  • Is the code trustable? If yes, `print_r(eval(sprintf('return array(%s);', $str)));` would do the trick. Though, be mindful of all the eval-problems. – Yoshi Oct 24 '11 at 09:26
  • @Yoshi: [At least validate](http://stackoverflow.com/questions/7873354/reliably-convert-string-containing-php-array-info-to-array/7874314#7874314). – hakre Oct 24 '11 at 10:52
  • @hakre As everyone who suggested eval, also wrote a disclaimer about using it, I guess that was understood. – Yoshi Oct 24 '11 at 10:57
  • @Yoshi: So how do you like my suggestion to deal with it? – hakre Oct 24 '11 at 10:59

2 Answers2

4

is there a way to explode this reliably even if there is commas inside the desired chunks?

PHP by default does not provide such a function. However you have a compact subset of PHP inside your string and PHP offers some tools here: A PHP tokenizer and a PHP parser.

Therefore it's possible for your string specification to create a helper function that validates the input against allowed tokens and then parse it:

$str = "array(1,3,4),array(array(4,5,6)),'this is a comma , inside a string', array('asdf' => 'lalal')";

function explode_string($str)
{
    $result = NULL;

    // validate string
    $isValid = FALSE;
    $tokens = token_get_all(sprintf('<?php %s', $str));
    array_shift($tokens);
    $valid = array(305, 315, 358, 360, 371, '(', ')', ',');
    foreach($tokens as $token)
    {
        list($index) = (array) $token;
        if (!in_array($index, $valid))
        {
            $isValid = FALSE;
            break;
        }
    }
    if (!$isValid)
        throw new InvalidArgumentException('Invalid string.');

    // parse string
    $return = eval(sprintf('return array(%s);', $str));

    return $return;
}

echo $str, "\n";

$result = explode_string($str);

var_dump($result);

The tokens used are:

T_LNUMBER (305)
T_CONSTANT_ENCAPSED_STRING (315)
T_DOUBLE_ARROW (358)
T_ARRAY (360)
T_WHITESPACE (371)

The token index number can be given a token name by using token_name.

Which gives you (Demo):

Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 3
            [2] => 4
        )

    [1] => Array
        (
            [0] => Array
                (
                    [0] => 4
                    [1] => 5
                    [2] => 6
                )

        )

    [2] => this is a comma , inside a string
    [3] => Array
        (
            [asdf] => lalal
        )

)
hakre
  • 193,403
  • 52
  • 435
  • 836
  • +1 - though, for documentation I'd use the [Token constants](http://www.php.net/manual/tokens.php) or atleast note what tokens are used. ;) – Yoshi Oct 24 '11 at 11:03
  • @Yoshi: Edited and added a listing of tokens in use. – hakre Oct 24 '11 at 16:53
  • Just seeing it's very closely the same question here: [Simulate php array language construct or parse with regexp?](http://stackoverflow.com/q/3267951/367456) which has a *similar* answer: http://stackoverflow.com/a/3268443/367456 – hakre Oct 31 '12 at 13:52
0

You can write a simple parser:

function explode_str_arr($str) {
    $str.=',';
    $escape_char = '';
    $str_len = strlen($str);
    $cur_value = '';
    $return_arr = array();
    $cur_bracket_level = 0;
    for ($i = 0; $i < $str_len; $i++) {
        if ($escape_char) {
            if ($str[$i] === $escape_char) {
                $escape_char = '';
            }
            $cur_value.=$str[$i];
            continue;
        }

        switch ($str[$i]) {
            case '\'':
            case '"':
                $escape_char = $str[$i];
                break;
            case '(':
                $cur_bracket_level++;
                break;
            case ')':
                $cur_bracket_level--;
                break;
            case ',':
                if (!$cur_bracket_level) {
                    $return_arr[] = $cur_value;
                    $cur_value = '';
                    continue 2;
                }
        }
        $cur_value.=$str[$i];
    }
    return $return_arr;
}

It is ugly unicode-breaking fast code, but I think you may get the idea.

XzKto
  • 2,472
  • 18
  • 18
  • The parser you suggest is incomplete, it does not take care of arrays in arrays (biggest conceptual flaw I see) and it does not handle `=>`. – hakre Oct 24 '11 at 10:54
  • @hakre: yeah, nice example btw, tokens are yummy :). But one of us misunderstood the question (most likely me): I think OP wasn't asking how to create arrays etc., the only question I see is how to fix simple explode() to work on escaped strings. Maby I'm blind ) – XzKto Oct 24 '11 at 11:25
  • No, that's a good point indeed. My parsing conception might be too complicated and I somewhat put my expectations on your answer. Because if the commas that prevent the splitting in explode can be skipped, that's already half of the job. – hakre Oct 24 '11 at 16:45