48

I have never really thought about this until today, but after searching the web I didn't really find anything. Maybe I wasn't wording it right in the search.

Given an array (of multiple dimensions or not):

$data = array('this' => array('is' => 'the'), 'challenge' => array('for' => array('you')));

When var_dumped:

array(2) { ["this"]=> array(1) { ["is"]=> string(3) "the" } ["challenge"]=> array(1) { ["for"]=> array(1) { [0]=> string(3) "you" } } }

The challenge is this: What is the best optimized method for recompiling the array to a useable array for PHP? Like an undump_var() function. Whether the data is all on one line as output in a browser or whether it contains the line breaks as output to terminal.

Is it just a matter of regex? Or is there some other way? I am looking for creativity.

UPDATE: Note. I am familiar with serialize and unserialize folks. I am not looking for alternative solutions. This is a code challenge to see if it can be done in an optimized and creative way. So serialize and var_export are not solutions here. Nor are they the best answers.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
Chuck Burgess
  • 11,600
  • 5
  • 41
  • 74
  • 3
    Yes, it's possible by parsing it. No, it's not something you'd usually want to bother with, since you're doing something wrong if you really need this. Maybe make a Community Wiki Code Golf question out of this, then there's something to it. – deceze Aug 20 '10 at 14:37
  • It's definitely possible, but it's not going to be trivial since the syntax is not meant to be machine parsable. When you have things like `string(8) "Foo"bar"` and other weird edge cases, it's going to make it relatively messy to implement in a reliable manor... If there are elegant solutions, I'd love to see them. But realize that most fully working solutions will likely be rather lengthy and have a fair bit of logic inside... – ircmaxell Aug 20 '10 at 15:03
  • What's wrong with `var_export()`? – NullUserException Aug 20 '10 at 15:29
  • Nothing... except this question is not about using alternatives to var_dump. It's about taking an already var_dumped string and returning it to the state it was in before being var_dumped. – Chuck Burgess Aug 20 '10 at 16:08
  • Is it just me or is the "When var_dumped:" example not actually what would be dumped? – salathe Aug 20 '10 at 16:21
  • I've merged in [another question](http://stackoverflow.com/questions/23439260/unserialize-var-dumped-structure) to here, just FYI. – Andrew Barber May 13 '14 at 14:28
  • I think this can help http://stackoverflow.com/questions/4345554/convert-php-object-to-associative-array – Mamoon Rashid Oct 31 '16 at 15:31

7 Answers7

74

var_export or serialize is what you're looking for. var_export will render a PHP parsable array syntax, and serialize will render a non-human readable but reversible "array to string" conversion...

Edit Alright, for the challenge:

Basically, I convert the output into a serialized string (and then unserialize it). I don't claim this to be perfect, but it appears to work on some pretty complex structures that I've tried...

function unvar_dump($str) {
    if (strpos($str, "\n") === false) {
        //Add new lines:
        $regex = array(
            '#(\\[.*?\\]=>)#',
            '#(string\\(|int\\(|float\\(|array\\(|NULL|object\\(|})#',
        );
        $str = preg_replace($regex, "\n\\1", $str);
        $str = trim($str);
    }
    $regex = array(
        '#^\\040*NULL\\040*$#m',
        '#^\\s*array\\((.*?)\\)\\s*{\\s*$#m',
        '#^\\s*string\\((.*?)\\)\\s*(.*?)$#m',
        '#^\\s*int\\((.*?)\\)\\s*$#m',
        '#^\\s*bool\\(true\\)\\s*$#m',
        '#^\\s*bool\\(false\\)\\s*$#m',
        '#^\\s*float\\((.*?)\\)\\s*$#m',
        '#^\\s*\[(\\d+)\\]\\s*=>\\s*$#m',
        '#\\s*?\\r?\\n\\s*#m',
    );
    $replace = array(
        'N',
        'a:\\1:{',
        's:\\1:\\2',
        'i:\\1',
        'b:1',
        'b:0',
        'd:\\1',
        'i:\\1',
        ';'
    );
    $serialized = preg_replace($regex, $replace, $str);
    $func = create_function(
        '$match', 
        'return "s:".strlen($match[1]).":\\"".$match[1]."\\"";'
    );
    $serialized = preg_replace_callback(
        '#\\s*\\["(.*?)"\\]\\s*=>#', 
        $func,
        $serialized
    );
    $func = create_function(
        '$match', 
        'return "O:".strlen($match[1]).":\\"".$match[1]."\\":".$match[2].":{";'
    );
    $serialized = preg_replace_callback(
        '#object\\((.*?)\\).*?\\((\\d+)\\)\\s*{\\s*;#', 
        $func, 
        $serialized
    );
    $serialized = preg_replace(
        array('#};#', '#{;#'), 
        array('}', '{'), 
        $serialized
    );

    return unserialize($serialized);
}

I tested it on a complex structure such as:

array(4) {
  ["foo"]=>
  string(8) "Foo"bar""
  [0]=>
  int(4)
  [5]=>
  float(43.2)
  ["af"]=>
  array(3) {
    [0]=>
    string(3) "123"
    [1]=>
    object(stdClass)#2 (2) {
      ["bar"]=>
      string(4) "bart"
      ["foo"]=>
      array(1) {
        [0]=>
        string(2) "re"
      }
    }
    [2]=>
    NULL
  }
}
Pang
  • 9,564
  • 146
  • 81
  • 122
ircmaxell
  • 163,128
  • 34
  • 264
  • 314
  • 1
    @Gordon you beat me to it. I was just going back to edit those links in. Thanks! – ircmaxell Aug 20 '10 at 14:38
  • 2
    I think you misunderstood the question. The challenge is to reverse the var_dump into an array. I am familiar with serialize() and unserialize()... and yes, they are by far better options. This is a code challenge. Maybe it's not worth the effort, but I wanted to see if it could be done in an optimized and creative way. I am not looking for an alternative solution. – Chuck Burgess Aug 20 '10 at 14:41
  • @cdburgess: It is strange, what do you want to do exactly? – Sarfraz Aug 20 '10 at 14:43
  • The challenge is to take the output of var_dump and print out the rebuilt array. So going from `array(2) { ["this"]=> array(1) {...` back to `array('this' => array(` – Chuck Burgess Aug 20 '10 at 14:46
  • 1
    @cdburgess: So the title of your question should be **Code Challenge - Convert var_dump back to array/variable** – Sarfraz Aug 20 '10 at 14:50
  • Looks great. However, When I paste your code into a file, it will not execute. – Chuck Burgess Aug 20 '10 at 16:05
  • Are you on php 5.2? Because that code is written for 5.3+ (If you want to change it back, you'll need to change the `$foo = function` calls to create_function). I'll whip up the quick change and edit back in... – ircmaxell Aug 20 '10 at 16:08
  • And I just edited back in a far more robust version of the regexps that should account for strings with serialized tokens inside of them... – ircmaxell Aug 20 '10 at 16:21
  • PHP Notice: unserialize(): Error at offset 0 of 208 bytes in /home/y/share/htdocs/test.php on line 51 ... however, I am using a slightly different version of the var_dump. `$export = 'array(2) { ["this"]=> array(2) { ["is"]=> string(3) "the" [0]=> array(2) { [0]=> string(3) "one" [1]=> string(4) "only" } } ["challenge"]=> array(1) { ["for"]=> array(2) { [0]=> string(3) "you" [1]=> int(2) } } }';` – Chuck Burgess Aug 20 '10 at 18:48
  • Are there new lines (like `var_dump` provides)? Or did you just make it into a single line string (which makes the parsing a lot harder to do as robust)... – ircmaxell Aug 20 '10 at 18:56
  • @cdburgess: But that's not how PHP outputs a var_dump. There are linebreaks in it. And my solution depends upon those linebreaks. try doing:`ob_start(); var_dump($var); $data = ob_get_clean();` and then calling my function with `$data`... – ircmaxell Aug 21 '10 at 13:12
  • If it is output to a webpage it does. But thanks for the clarification. I will update the question so it is more clear. – Chuck Burgess Aug 21 '10 at 13:17
  • @cdburgess: Wrap it in `
    ` tags.  You'll see that there are new lines...  Otherwise, it's not truly output of `var_dump` (Since its output includes new lines, and removing them changes the output)...
    – ircmaxell Aug 21 '10 at 13:30
  • @cdburgess: Ok, I added some support for the dump on a single line. Be aware that this may wind up changing the strings if they have any of the "tokens" inside of them (and hence break the serialized output)... It'll be robust if there are new lines, but if there are not it may stumble more... – ircmaxell Aug 21 '10 at 14:10
  • 1
    hi your method is not working on Flickr var_dump array. Warning: strpos() expects parameter 1 to be string, array given in /opt/lampp/htdocs/phpflickr/example.php on line 28 Notice: Array to string conversion in /opt/lampp/htdocs/phpflickr/example.php on line 59 Warning: unserialize() expects parameter 1 to be string, array given in /opt/lampp/htdocs/phpflickr/example.php on line 84 – Rahul Mandaliya Nov 21 '14 at 19:01
  • @ircmaxell Hi there, I tried to test your code but could get it to work, would you mind to shed some light? http://sandbox.onlinephpfunctions.com/code/1c73954f01598f137fe806d16fb151ea20ebe2df – SML Jun 14 '16 at 05:15
  • Not compatible since PHP 7.2.4 – Black Feb 19 '21 at 14:01
16

There's no other way than manual parsing depending on the type. I didn't add support for objects, but it's very similar to the arrays one; you just need to do some reflection magic to populate not only public properties and to not trigger the constructor.

EDIT: Added support for objects... Reflection magic...

function unserializeDump($str, &$i = 0) {
    $strtok = substr($str, $i);
    switch ($type = strtok($strtok, "(")) { // get type, before first parenthesis
         case "bool":
             return strtok(")") === "true"?(bool) $i += 10:!$i += 11;
         case "int":
             $int = (int)substr($str, $i + 4);
             $i += strlen($int) + 5;
             return $int;
         case "string":
             $i += 11 + ($len = (int)substr($str, $i + 7)) + strlen($len);
             return substr($str, $i - $len - 1, $len);
         case "float":
             return (float)($float = strtok(")")) + !$i += strlen($float) + 7;
         case "NULL":
             return NULL;
         case "array":
             $array = array();
             $len = (int)substr($str, $i + 6);
             $i = strpos($str, "\n", $i) - 1;
             for ($entries = 0; $entries < $len; $entries++) {
                 $i = strpos($str, "\n", $i);
                 $indent = -1 - (int)$i + $i = strpos($str, "[", $i);
                 // get key int/string
                 if ($str[$i + 1] == '"') {
                     // use longest possible sequence to avoid key and dump structure collisions
                     $key = substr($str, $i + 2, - 2 - $i + $i = strpos($str, "\"]=>\n  ", $i));
                 } else {
                     $key = (int)substr($str, $i + 1);
                     $i += strlen($key);
                 }
                 $i += $indent + 5; // jump line
                 $array[$key] = unserializeDump($str, $i);
             }
             $i = strpos($str, "}", $i) + 1;
             return $array;
         case "object":
             $reflection = new ReflectionClass(strtok(")"));
             $object = $reflection->newInstanceWithoutConstructor();
             $len = !strtok("(") + strtok(")");
             $i = strpos($str, "\n", $i) - 1;
             for ($entries = 0; $entries < $len; $entries++) {
                 $i = strpos($str, "\n", $i);
                 $indent = -1 - (int)$i + $i = strpos($str, "[", $i);
                 // use longest possible sequence to avoid key and dump structure collisions
                 $key = substr($str, $i + 2, - 2 - $i + $i = min(strpos($str, "\"]=>\n  ", $i)?:INF, strpos($str, "\":protected]=>\n  ", $i)?:INF, $priv = strpos($str, "\":\"", $i)?:INF));
                 if ($priv == $i) {
                     $ref = new ReflectionClass(substr($str, $i + 3, - 3 - $i + $i = strpos($str, "\":private]=>\n  ", $i)));
                     $i += $indent + 13; // jump line
                 } else {
                     $i += $indent + ($str[$i+1] == ":"?15:5); // jump line
                     $ref = $reflection;
                 }
                 $prop = $ref->getProperty($key);
                 $prop->setAccessible(true);
                 $prop->setValue($object, unserializeDump($str, $i));
             }
             $i = strpos($str, "}", $i) + 1;
             return $object;

    }
    throw new Exception("Type not recognized...: $type");
}

(Here are a lot of "magic" numbers when incrementing string position counter $i, mostly just string lengths of the keywords and some parenthesis etc.)

bwoebi
  • 23,637
  • 5
  • 58
  • 79
  • Thanks! I like your approach, but some strings don't get parsed correctly, for example: `'string(6) "ab};cd"'` returns `d"`. – gog May 13 '14 at 13:56
  • @georg oh, that was a dumb error and wrote just a `strlen()` too much at the wrong place. Better? — I just didn't notice it as I always tested with strings of length 1... – bwoebi May 13 '14 at 14:00
  • @bwoebi It seems `bool(true)` is not parsed correctly. I had included a fix for that in my edit. – user1460043 Mar 30 '15 at 01:11
  • @user1460043 yep, I saw that, but your fix wasn't exactly what it should have been… solution just was to no pass vars again to strtok(). – bwoebi Mar 30 '15 at 10:12
  • @bwoebi Yes, the `bool` case looks cleaner now and works. But the `float` case seems wrong now, try `float(1.5)`. – user1460043 Mar 31 '15 at 02:43
  • @bwoebi I tried to test your code with var_dump of various arrays but could get it to work, would you mind to shed some light? http://sandbox.onlinephpfunctions.com/code/583679cb1f6d5a1d816069daba4b52c670039a90 – SML Jun 14 '16 at 05:26
  • @SML you are using \r\n linebreaks … you'll need to replace the inputs linebreaks by \n. (or update all the offsets in the code responsible for line counting...) – bwoebi Jun 14 '16 at 12:35
6

If you want to encode/decode an array like this, you should either use var_export(), which generates output in PHP's array for, for instance:

array(
  1 => 'foo',
  2 => 'bar'
)

could be the result of it. You would have to use eval() to get the array back, though, and that is a potentially dangerous way (especially since eval() really executes PHP code, so a simple code injection could make hackers able to gain control over your PHP script).

Some even better solutions are serialize(), which creates a serialized version of any array or object; and json_encode(), which encodes any array or object with the JSON format (which is more preferred for data exchange between different languages).

Frxstrem
  • 38,761
  • 9
  • 79
  • 119
5

The trick is to match by chunks of code and "strings", and on strings do nothing but otherwise do the replacements:

$out = preg_replace_callback('/"[^"]*"|[^"]+/','repl',$in);

function repl($m)
{
    return $m[0][0]=='"'?
        str_replace('"',"'",$m[0])
    :
        str_replace("(,","(",
            preg_replace("/(int\((\d+)\)|\s*|(string|)\(\d+\))/","\\2",
                strtr($m[0],"{}[]","(), ")
            )
        );
}

outputs:

array('this'=>array('is'=>'the'),'challenge'=>array('for'=>array(0=>'you')))

(removing ascending numeric keys starting at 0 takes a little extra accounting, which can be done in the repl function.)

ps. this doesn't solve the problem of strings containing ", but as it seems that var_dump doesn't escape string contents, there is no way to solve that reliably. (you could match \["[^"]*"\] but a string may contain "] as well)

mvds
  • 45,755
  • 8
  • 102
  • 111
  • This is great! You are one of the few who actually read and undertood the question. Thanks for taking the challenge and providing a working solution. Now what if there is an INT(5) as the value? (i.e. `array('you',2)`) It will be displayed as int(5) but should return from your function as 5. – Chuck Burgess Aug 20 '10 at 15:40
  • I just took your example to make it work. Replacing `int\(\d+\)` with the number doesn't sound like much of a challenge. see updated answer. – mvds Aug 20 '10 at 16:05
  • Superb! Very well done and in small optimized code! FYI: There is a missing comma after "\\2". – Chuck Burgess Aug 20 '10 at 18:55
1

Use regexp to change array(.) { (.*) } to array($1) and eval the code, this is not so easy as written because You have to deal with matching brackets etc., just a clue on how to find solution ;)

  • this will be helpful if You cant change var_dump to var_export, or serialize
canni
  • 5,737
  • 9
  • 46
  • 68
  • A regexp solution is going to be very difficult because you can have nested braces... So it's more likely to involve a string parser than a regexp (considering you have state to worry about due to the nesting)... – ircmaxell Aug 20 '10 at 14:37
  • no You do not have to deal with string parser, regexp have some superb functions as ungreed/global flags etc, it can be done with one single regexp with correct setted flags :) – canni Aug 20 '10 at 14:41
  • the BBcode parsers are build on top of regexp, and work well without state machne ;) just consider 'array(.) {' and '}' as close/open tags :) – canni Aug 20 '10 at 14:45
  • 1
    Then show me a single regex that will convert all valid var_dumped data back into native parsable php... I'll admit I'm wrong if you can show me an example of a regex that can deal with: `array(1) { ["foo}[bar]"] => string(4) "baz{" }` – ircmaxell Aug 20 '10 at 14:51
  • You're probably right it can't be done by just one regexp, but still, You can use one regexp per "tag" where tag is one of: array(.) ; string(.) ; integer(.) etc. and parse output in correct order (simple types -> arrays) but still it is not possible to "reparse" var_dumped objects and other non-starndard structures, for this we have serialize and other stuff – canni Aug 20 '10 at 15:09
  • note that cdburgess is looking for a code challenge, so i'm putting some clues on how it can be achieved :) – canni Aug 20 '10 at 15:10
0

I think you are looking for the serialize function:

serialize — Generates a storable representation of a value

It allows you to save the contents of array in readable format and later you can read the array back with unserialize function.

Using these functions, you can store/retrieve the arrays even in text/flat files as well as database.

Sarfraz
  • 377,238
  • 77
  • 533
  • 578
0

Updated to NOT USE create_function, as it is DEPRECATED as of PHP 7.2.0. Instead it is replaced to use anonymous functions:



    function unvar_dump($str) {
        if (strpos($str, "\n") === false) {
            //Add new lines:
            $regex = array(
                '#(\[.*?\]=>)#',
                '#(string\(|int\(|float\(|array\(|NULL|object\(|})#',
            );
            $str = preg_replace($regex, "\n\1", $str);
            $str = trim($str);
        }
        $regex = array(
            '#^\040*NULL\040*$#m',
            '#^\s*array\((.*?)\)\s*{\s*$#m',
            '#^\s*string\((.*?)\)\s*(.*?)$#m',
            '#^\s*int\((.*?)\)\s*$#m',
            '#^\s*bool\(true\)\s*$#m',
            '#^\s*bool\(false\)\s*$#m',
            '#^\s*float\((.*?)\)\s*$#m',
            '#^\s*\[(\d+)\]\s*=>\s*$#m',
            '#\s*?\r?\n\s*#m',
        );
        $replace = array(
            'N',
            'a:\1:{',
            's:\1:\2',
            'i:\1',
            'b:1',
            'b:0',
            'd:\1',
            'i:\1',
            ';'
        );
        $serialized = preg_replace($regex, $replace, $str);
        $func = function($match) {
            return 's:'.strlen($match[1]).':"'.$match[1].'"';
        };
        $serialized = preg_replace_callback(
            '#\s*\["(.*?)"\]\s*=>#', 
            $func,
            $serialized
        );
        $func = function($match) {
            return 'O:'.strlen($match[1]).':"'.$match[1].'":'.$match[2].':{';
        };
        $serialized = preg_replace_callback(
            '#object\((.*?)\).*?\((\d+)\)\s*{\s*;#', 
            $func, 
            $serialized
        );
        $serialized = preg_replace(
            array('#};#', '#{;#'), 
            array('}', '{'), 
            $serialized
        );

        return unserialize($serialized);
    }

    $test = 'array(10) {
      ["status"]=>
      string(1) "1"
      ["transactionID"]=>
      string(14) "1532xxx"
      ["orderID"]=>
      string(10) "1532xxx"
      ["value"]=>
      string(8) "0.73xxx"
      ["address"]=>
      string(1) "-"
      ["confirmations"]=>
      string(3) "999"
      ["transaction_hash"]=>
      string(64) "internxxx"
      ["notes"]=>
      string(0) ""
      ["txCost"]=>
      string(1) "0"
      ["txTimestamp"]=>
      string(10) "1532078165"
    }';
    var_export(unvar_dump($test));