5

I have a set of files that I want to concatenate (each represents a part from a multi-part download).

Each splitted file is about 250MiB in size, and I have a variable number of them.

My concatenation logic is straight-forward:

if (is_resource($handle = fopen($output, 'xb')) === true)
{
    foreach ($parts as $part)
    {
        if (is_resource($part = fopen($part, 'rb')) === true)
        {
            while (feof($part) !== true)
            {
                fwrite($handle, fread($part, 4096));
            }

            fclose($part);
        }
    }

    fclose($handle);
}

It took me a while to trace it down but, apparently, whenever I have more than 8 individual parts (totaling 2GiB) my output file gets truncated to 2147483647 bytes (reported by sprintf('%u', $output)).

I suppose this is due to some kind of 32-bit internal counter used by fopen() or fwrite().

How can I work around this problem (preferably using only PHP)?

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • This topic can be interesting: http://stackoverflow.com/questions/4229534/is-fopen-limited-by-the-filesystem – Lajos Veres Oct 17 '13 at 22:12
  • @LajosVeres: Thanks, reading it now. – Alix Axel Oct 17 '13 at 22:13
  • @LajosVeres: I think I'm running a 32-bit install (will check in a moment), but it's still a bit weird as my file is capped at 2GiB and not 4GiB. Since there's no such thing as a negative file offset, I assume `fopen()` / `fwrite()` would be smart enough to use unsigned 32-bit integers and allow me to write up to 4GiB (that would be sufficient in this case). – Alix Axel Oct 17 '13 at 22:20
  • 1
    @Sammitch: That is wrong. PHP only uses 32 bit integer on all versions of Windows. On Linux, 64 bit integers are used if possible. This is due to a limitation of the compiler used for the windows version, as stated on the mailing list: http://marc.info/?l=php-internals&m=137002754604365&w=2 – Sven Oct 17 '13 at 22:26
  • @Sven retracted. Has this also been fixed for receiving 2-4GB+ POST requests? – Sammitch Oct 17 '13 at 22:43
  • @Sammitch: Doesn't look like it: *right now PHP can't really handle strings >= 2^31 characters even on 64 bit compiles*. I'm also wondering if switching to a 64-bit OS will effectively raise the 2GiB limitation or not, as apparently the C functions need to play along: http://stackoverflow.com/a/730735/89771. – Alix Axel Oct 17 '13 at 22:47
  • I see that $output is a filename. Why are you trying to show it as integer (sprintf('%u', $output))? – sectus Oct 17 '13 at 23:42
  • @AlixAxel good, then I didn't *entirely* waste my life writing an upload handler in python for 40GB+ files. :P – Sammitch Oct 17 '13 at 23:50
  • @Sammitch:I'm also weighting porting this code to Python, but I have so much more code in PHP that I would favor a simpler solution for this. =) – Alix Axel Oct 18 '13 at 00:48
  • This link: http://uk1.php.net/manual/en/function.filesize.php says: "Note: Because PHP's integer type is signed and many platforms use 32bit integers, some filesystem functions may return unexpected results for files which are larger than 2GB. " Maybe using file_get_contents could help. (Where you should not use return values.) – Lajos Veres Oct 18 '13 at 08:07

1 Answers1

2

As a workaround, you could use the shell. If the code must be portable, this would only include about two variants for Windows and Linux (covering MacOS as well).

Linux

cat file1.txt file2.txt  > file.txt

Windows

copy file1.txt+file1.txt file.txt

Note that when creating a command line, escaping the variable arguments is very important. Use escapeshellarg() to wrap the filenames (see http://de1.php.net/escapeshellarg).

To detect whether you are on Windows or Linux, have a look at the constant PHP_OS. (best explained here: http://www.php.net/manual/en/function.php-uname.php)

Sven
  • 69,403
  • 10
  • 107
  • 109