30

I'm working with a large array which is a height map, 1024x1024 and of course, i'm stuck with the memory limit. In my test machine i can increase the mem limit to 1gb if i want, but in my tiny VPS with only 256 ram, it's not an option.

I've been searching in stack and google and found several "well, you are using PHP not because memory efficiency, ditch it and rewrite in c++" and honestly, that's ok and I recognize PHP loves memory.

But, when digging more inside PHP memory management, I did not find what memory consumes every data type. Or if casting to another type of data reduces mem consumption.

The only "optimization" technique i found was to unset variables and arrays, that's it.

Converting the code to c++ using some PHP parsers would solve the problem?

Thanks!

Gabriel
  • 5,453
  • 14
  • 63
  • 92
  • 3
    Arrays are truly memory-hungry in PHP (as they are actually dictionaries). If you can give up some (lots of!) speed you can [fake binary arrays like in C](http://stackoverflow.com/questions/5505124/cheating-php-integers/5505643#5505643), also for 2D structures I guess. But maybe you really want to investigate the [HipHop PHP to C++ compiler](https://github.com/facebook/hiphop-php/wiki/). – mario Jun 13 '11 at 21:25
  • Every variable in PHP has overhead associated with it. Not only does the variable's value have to be stored, but the variable's name, type, etc... Even a simple `$x[1] = 2;` has a large body of extra stuff following it around. – Marc B Jun 13 '11 at 21:27
  • @mario I wonder why the linked post stopped at hex-encoding and does not directly use the full bytes of the string. It seems like a little math could actually be faster ... but I don't use PHP (It would have no regard for valid multibyte sequences and whatnot :-) –  Jun 13 '11 at 22:12
  • 1
    @pst: I do actually have another version using `pack()` for binary strings. But that's not really faster; just saves twice the memory. (There's only so much you can fake in PHP ;) – mario Jun 13 '11 at 22:22
  • How much memory usage do you need to cut? As you have read, there is little in the way of memory management that you can do in PHP. There are "optimizations" you can make, but probably nothing that's going to cut as much as you may need. – simshaun Jun 13 '11 at 21:26

3 Answers3

59

If you want a real indexed array, use SplFixedArray. It uses less memory. Also, PHP 5.3 has a much better garbage collector.

Other than that, well, PHP will use more memory than a more carefully written C/C++ equivalent.

Memory Usage for 1024x1024 integer array:

  • Standard array: 218,756,848
  • SplFixedArray: 92,914,208

as measured by memory_get_peak_usage()

$array = new SplFixedArray(1024 * 1024); // array();
for ($i = 0; $i < 1024 * 1024; ++$i)
  $array[$i] = 0;

echo memory_get_peak_usage();

Note that the same array in C using 64-bit integers would be 8M.

As others have suggested, you could pack the data into a string. This is slower but much more memory efficient. If using 8 bit values it's super easy:

$x = str_repeat(chr(0), 1024*1024);
$x[$i] = chr($v & 0xff); // store value $v into $x[$i]
$v = ord($x[$i]);        // get value $v from $x[$i]

Here the memory will only be about 1.5MB (that is, when considering the entire overhead of PHP with just this integer string array).

For the fun of it, I created a simple benchmark of creating 1024x1024 8-bit integers and then looping through them once. The packed versions all used ArrayAccess so that the user code looked the same.

                   mem    write   read
array              218M   0.589s  0.176s
packed array       32.7M  1.85s   1.13s
packed spl array   13.8M  1.91s   1.18s
packed string      1.72M  1.11s   1.08s

The packed arrays used native 64-bit integers (only packing 7 bytes to avoid dealing with signed data) and the packed string used ord and chr. Obviously implementation details and computer specs will affect things a bit, but I would expect you to get similar results.

So while the array was 6x faster it also used 125x the memory as the next best alternative: packed strings. Obviously the speed is irrelevant if you are running out of memory. (When I used packed strings directly without an ArrayAccess class they were only 3x slower than native arrays.)

In short, to summarize, I would use something other than pure PHP to process this data if speed is of any concern.

Matthew
  • 47,584
  • 11
  • 86
  • 98
  • +1 Additionally, emulating array indices and using packing may further reduce the memory usage, if applicable. E.g. if each height-map value is only 8 bits the *memory usage should be considerably less* when packed to 32bits (or 64bits depending upon PHP bitness). The exact gain in efficiency varies due to payload size/utilization vs. value maintenance overheads of the PHP values used. (I think there are 4 bytes of "overhead" per integer value, but I'm not entirely sure.) –  Jun 13 '11 at 22:02
  • Apparently there is a more than 4 bytes of overhead ... [this post](http://stackoverflow.com/questions/5972170/what-is-the-overhead-of-using-php-int) suggests it may take upwards of 36 (or 72 on x64) bytes just for a trivial value. This indicates that it is *very beneficial* (in terms of memory usage) to pack. Assuming 8bit input and a 32bit arch, 4 values would take ~36bytes vs. ~144bytes if packed while on an x64 machine 8 values would take ~72bytes vs. ~576bytes! (Yikes!) –  Jun 13 '11 at 22:34
  • So, in conclusion ... with packing, 8 bit values are amortized to ~9 bytes for an off-the-cuff estimation of 9MB of object overhead/data, excluding memory required for inclusion in the array itself, etc -- quartering the numbers posted is ~22.5MB total usage. (Such packing may seem over-optimizing, but considering the target is limited to 256MB RAM .. ;-) –  Jun 13 '11 at 22:49
  • 1
    @pst, I've added a bit about packing the data into a string. When dealing with 8-bit integers (as perhaps the height map is) then a string integer array will basically be the same size as what the C equivalent would be. Of course, the speed will be much, much worse than the native integers. – Matthew Jun 13 '11 at 23:07
  • @konforce Packing into a PHP integral value :) Speed should be close to non-packed for *many* operations (just a mask and a shift extra) -- and much more efficient memory-wise (but not as efficient as kludging into a string). –  Jun 14 '11 at 02:31
  • @pst, Sorry I was confusing yours with mario's suggestions. Yes, with an SplFixedArray, it should be about 22MB if you pack 8 bit integers. It should be faster than the 1.5MB packed string equivalent, although one never knows with PHP. (e.g., The integers are signed, so dealing with the high bit could require more than simple shifts.) The fastest memory friendly solution would be to write a native integer array as a C extension and expose it as a PHP class. – Matthew Jun 14 '11 at 03:26
  • Err, I forgot my numbers were on 64-bit, so the SplFixedArray with packed integers should be around 12MB. – Matthew Jun 14 '11 at 03:31
  • SplFixedArray seems to offer the best benefit/work ratio (i'll have to rename all my String indexes to integer ones and that's it). Nicey! <3 – Gabriel Jun 14 '11 at 13:23
12

In addition to the accepted answer and suggestions in the comments, I'd like to suggest PHP Judy array implementation.

Quick tests showed interesting results. An array with 1 million entries using regular PHP array data structure takes ~200 MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation.

N.B.
  • 13,688
  • 3
  • 45
  • 55
  • Ill check it out, nice nice! In my case, i can live with a performance hit in order to save some ram. – Gabriel Oct 19 '11 at 15:29
  • Exactly what I need! [Judy Array](http://en.wikipedia.org/wiki/Judy_array) is awesome. High performance and low memory usage. – Tiago Fischer Dec 12 '12 at 13:11
  • @FlycKER - I'm glad someone decided to use this awesome array implementation :) – N.B. Dec 12 '12 at 13:46
  • Isn't the Judy Array just using a packed PHP string internally? – Pacerier Jul 13 '13 at 07:49
  • @N.B., no, but there doesn't seem to be another way of implementing it more efficiently in PHP is there? – Pacerier Jul 24 '13 at 04:52
  • @Pacerier - I don't really understand what you're asking.. Judy is an extension for PHP, all the memory allocation and work is managed by the extension and not PHP internal data structures, especially not the string. – N.B. Jul 24 '13 at 08:05
6

A little bit late to the party, but if you have a multidimensional array you can save a lot of RAM when you store the complete array as json.

$array = [];

$data = [];
$data["a"] = "hello";
$data["b"] = "world";

To store this array just use:

$array[] = json_encode($data);

instead of

$array[] = $data;

If you want to get the arrry back, just use something like:

$myData = json_decode($array[0], true);

I had a big array with 275.000 sets and saved about 36% RAM consumption.

EDIT: I found a more better way, when you zip the json string:

$array[] = gzencode(json_encode($data));

and unzip it when you need it:

$myData = json_decode(gzdecode($array[0], true));

This saved me nearly 75% of RAM peak usage.

Marco
  • 3,470
  • 4
  • 23
  • 35