3

I want to read a binary file as byte[] in php , as suggested here i unpack'ed fread's output . so i have something like:

$file=fopen($filename,'r');
fseek($file, $offset);  //file is 500MB so i take it 10MB at a time
$tmp = fread($file,$len);
//so far so good , $tmp includes 10MB of data
var_dump(strlen($tmp));    //int(10485760) 10MB
var_dump(memory_get_usage(true)); //int(11272192) 11MB
$data = unpack('C*',$tmp);

this throws

PHP Fatal error:  Allowed memory size of 536870912 bytes exhausted (tried to allocate 32 bytes) in [myfile.php] on line [unpack line]

as the error suggests memory limit is set at 512MB and according to memory_get_usage only 11/512MB was used and i am unpacking a 10MB string . at most it should need 30MB (10MB for $tmp , 10MB for $data and 10MB for internal variables). why does it explode and can't unpack $tmp with 512MB ram ?

so the question is , am i doing something wrong here or is it a bug? and is there any other way to get an array of Bytes (0 to 255) to read binary files in php or should i switch to another language to do this?

additional notes : the code works with a 117KB file.

php -v
PHP 5.5.3-1ubuntu2.2 (cli) (built: Feb 28 2014 20:06:05) 
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2013 Zend Technologies
    with Zend OPcache v7.0.3-dev, Copyright (c) 1999-2013, by Zend Technologies
Community
  • 1
  • 1
Bor691
  • 606
  • 1
  • 11
  • 21

1 Answers1

2

In PHP variables are stored internally as zvals. Each element in the array will take significantly more memory than you expect. This is due to PHP being a weakly typed language and therefore requiring the ability to quickly swap the type of a variable internally. There is also the overhead of GC and the fact that an array in PHP is really a hash table.

You can find in-depth details here:

http://nikic.github.io/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

However, essentially to create an array of 10485760 elements would require approx. 760MB on 32bit and 1440MB on 64bit.

Your best option would be most likely not to unpack the string and instead when you require a certain element in the would be array to just access a certain position in the string.

For example a library you could use that uses this concept:

https://github.com/reiner-dolp/PHP-Memory-Efficient-Arrays

Tomdarkness
  • 3,800
  • 2
  • 21
  • 26
  • 1
    i've looked at the "memory efficient arrays" , it seems promising just one question . how can i convert a 10MB string to a ByteArray ? in other words a replacement for unpack function that works with the lib you said. – Bor691 Apr 06 '14 at 14:52