1


I am testing this code to download big files in 10 MB chunks:

/**
 * Copy remote file over HTTP one small chunk at a time.
 *
 * @param $infile The full URL to the remote file
 * @param $outfile The path where to save the file
 */
function copyfile_chunked($infile, $outfile) {
    $chunksize = 10 * (1024 * 1024); // 10 Megs

    /**
     * parse_url breaks a part a URL into it's parts, i.e. host, path,
     * query string, etc.
     */
    $parts = parse_url($infile);
    $i_handle = fsockopen($parts['host'], 80, $errstr, $errcode, 5);
    $o_handle = fopen($outfile, 'wb');

    if ($i_handle == false || $o_handle == false) {
        return false;
    }

    if (!empty($parts['query'])) {
        $parts['path'] .= '?' . $parts['query'];
    }

    /**
     * Send the request to the server for the file
     */
    $request = "GET {$parts['path']} HTTP/1.1\r\n";
    $request .= "Host: {$parts['host']}\r\n";
    $request .= "User-Agent: Mozilla/5.0\r\n";
    $request .= "Keep-Alive: 115\r\n";
    $request .= "Connection: keep-alive\r\n\r\n";
    fwrite($i_handle, $request);

    /**
     * Now read the headers from the remote server. We'll need
     * to get the content length.
     */
    $headers = array();
    while(!feof($i_handle)) {
        $line = fgets($i_handle);
        if ($line == "\r\n") break;
        $headers[] = $line;
    }

    /**
     * Look for the Content-Length header, and get the size
     * of the remote file.
     */
    $length = 0;
    foreach($headers as $header) {
        if (stripos($header, 'Content-Length:') === 0) {
            $length = (int)str_replace('Content-Length: ', '', $header);
            break;
        }
    }

    /**
     * Start reading in the remote file, and writing it to the
     * local file one chunk at a time.
     */
    $cnt = 0;
    while(!feof($i_handle)) {
        $buf = '';
        $buf = fread($i_handle, $chunksize);
        $bytes = fwrite($o_handle, $buf);
        if ($bytes == false) {
            return false;
        }
        $cnt += $bytes;

        /**
         * We're done reading when we've reached the conent length
         */
        if ($cnt >= $length) break;
    }

    fclose($i_handle);
    fclose($o_handle);
    return $cnt;
}

I am testing this code on a small image first. The image gets downloaded to my account, but in a currupted form: all the bytes seem correct exept "0D" bytes are removed from the downloaded image, which renders it unusable.
Why is this happening and how I can overcome it?
Thanks!

Community
  • 1
  • 1
GreenBear
  • 373
  • 1
  • 6
  • 18
  • Do you end up reading as many bytes as `Content-Length` indicates? – Jon Nov 05 '12 at 23:16
  • Hi, Jon. Strange thing with that. The original image is 15444 bytes, the function returns that 15444 bytes were downloaded, but when I retrieve the downloaded image it proves to be only 15397 bytes, probably because of the missing "0D" bytes. – GreenBear Nov 05 '12 at 23:21
  • Is there a particular reason you don't use a working http client library? Your wacky header decoding isn't HTTP compliant. And not all responses come in chunked TE. – mario Nov 05 '12 at 23:24
  • Misconfigured server? Maybe? Does the server send the correct `Content-Type` for an image? – Jon Nov 05 '12 at 23:24
  • I just removed with Hex editor all the "0D" bytes from the original image. The image became 15399 bytes in size, so I guess some two more bytes are removed upon downloading. – GreenBear Nov 05 '12 at 23:29
  • Hi, mario. I use a free hosting service and there is a download limit set in the php.ini file which I can not change, but trying to work around. Do you know any other way to do that? – GreenBear Nov 05 '12 at 23:31
  • Yes, Jon, the Content-Type for the image as returned by the serever where the image is stored is " image/jpeg". – GreenBear Nov 05 '12 at 23:34
  • Last bytes are completely identical in the original and the downloaded files, at least as far as I checked. – GreenBear Nov 05 '12 at 23:43
  • Everything seems to be right as far as setting that the data is to be interpreted as binary. The only other thing that strikes me as odd is using such a large 'chunk' size. It's a longshot, but what happens if you use 2 or 4 kB? – Sammitch Nov 05 '12 at 23:57

1 Answers1

1

Good day everybody, and thanks for the help.
The problem is resolved now and the culprit is identified.
I had been reading some books and found this:
ftp_get( ) copies a file on the remote server to your computer. The FTP_ASCII parameter transfers the file as if it were ASCII text. Under this option, linefeed endings are automatically converted as you move from one operating system to another. The other option is FTP_BINARY, which is used for nonplaintext files, so no linefeed conversions take place.
The code provided in my question works fine and it downloads the image correctly.
When I was checking the image, I was downloading it to my computer with a php-written file manager, which was provided by the hosting providers. Who, apparently, are not very good at PHP, as they used the FTP_ASCII parameter, mentioned above, for transferring a binary file. Thus the image became corrupted.
When I downloaded the image directly from the FTP account, the image proved to be identical to the original.
So, ultimately, the problem was with a PHP code, just not with the code that I compiled.

GreenBear
  • 373
  • 1
  • 6
  • 18