0

I'm looking for a way to serialize large arrays to a file in PHP.

Right now I use a simple JSON format. Unfortunately to store JSON to a file you need to convert it to a string first with json_encode and then write the string to a file. During this process the amount of used memory almost doubles (it's less). And in some cases it can be a problem if things are happening concurrently.

My question is: is there a PHP library (binary preferably) which can serialize an array to a file (a JSON format would be nice) without converting the object to a string and thus 'doubling' the memory. If the output can be compressed with GZIP, what would be even better.

Any other suggestion to write (and read) of large object without intermediate format/state are welcome too.

ddofborg
  • 2,028
  • 5
  • 21
  • 34
  • Is there a reason why you prefer JSON? Just speculating but perhaps `serialize()` or `var_export()` is faster? – Darragh Enright Feb 12 '19 at 22:47
  • I prefer json, because it's much smaller than `serialize()`. – Wiimm Feb 12 '19 at 22:50
  • Fair enough :) I figured that if the file is not being read by humans or other code then it might be worth benchmarking different options and making a decision between filesize and readability, if that trade-off exists. – Darragh Enright Feb 12 '19 at 22:54
  • To parse a large JSON document efficiently you can use something like [salsify/json-streaming-parser](https://github.com/salsify/jsonstreamingparser), but I'm not aware of an equivalent for writing out a large document. – Sammitch Feb 13 '19 at 00:50
  • Ok, you nerd-sniped me. https://packagist.org/packages/wrossmann/json_stream – Sammitch Feb 13 '19 at 04:14
  • `jsonstreamingparser` is too slow unfortunately. JSON is not required, but as long as there an intermediate string which is created, the problem will still be there I presume. – ddofborg Feb 13 '19 at 10:37

1 Answers1

0

If memory is the only concern

At the risk of being called Captain Obvious - I'd like to suggest a weird approach I like to use when there's not enough memory and I have to deal with something that only fits in once. Also, if garbage collection doesn't happen, that can be solved by doing the job in several steps as this article explains.

What I mean is something like this:

    function packWithoutExhaustingMemory (array $a) {
        foreach($a as $key => $value) {
            $a[$key] = gzcompress(serialize($value)); // but only one piece at a 
time!
        }
        return $a;
    }

Again, not sure if this exact piece will do the job but it illustrates the concept.

dkellner
  • 8,726
  • 2
  • 49
  • 47