How to partially download a remote file with cURL?

Question

Is it possible to partially download a remote file with cURL? Let's say, the actual filesize of the remote file is 1000 KB. How can I download only first 500 KB of it?

Using headers: https://unix.stackexchange.com/questions/121314/download-only-a-part-of-a-file — Anton Tarasenko, Oct 27 '17 at 11:28

score 36 · Accepted Answer · answered Jan 09 '10 at 13:10

36

You can also set the range header parameter with the php-curl extension.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.spiegel.de/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
echo $result;

But as noted before if the server doesn't honor this header but sends the whole file curl will download all of it. E.g. http://www.php.net ignores the header. But you can (in addition) set a write function callback and abort the request when more data is received, e.g.

// php 5.3+ only
// use function writefn($ch, $chunk) { ... } for earlier versions
$writefn = function($ch, $chunk) { 
  static $data='';
  static $limit = 500; // 500 bytes, it's only a test

  $len = strlen($data) + strlen($chunk);
  if ($len >= $limit ) {
    $data .= substr($chunk, 0, $limit-strlen($data));
    echo strlen($data) , ' ', $data;
    return -1;
  }

  $data .= $chunk;
  return strlen($chunk);
};

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.php.net/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, $writefn);
$result = curl_exec($ch);
curl_close($ch);

answered Jan 09 '10 at 13:10

VolkerK

95,432
20
163
226

+1 for good answer. This works because $writefn() returns -1 when the limit is reached. When the callback function returns anything other than the number of bytes passed to it (in $chunk), curl aborts the connection. – GZipp Jan 09 '10 at 16:09
Excellent. When I was doing the same in Perl, I had to use an alarm that triggered and checked for the file size, for the lack of a better method. Very hacky but it worked. – Artem Russakovskii Jan 09 '10 at 20:03
Just what I needed. However, worth mentioning is also the CURLOPT_BUFFERSIZE which defines the 'chunk' size. So if your buffer is very big, the chunk size might easily contain a full web page of data anyway (I think!) – Tom Carnell Jun 11 '13 at 06:09
-1, as this gives invalid results for other ranges (over 0). Your answer gives the impression that this works for all ranges and also for servers that don't support partial content. Nevertheless, when servers that ignore the range and send the whole content, only the first ranges get processed, independent from the set range. To get the right results, the first bytes till the offset of the range have to be ignored. – James Cameron Sep 29 '13 at 15:51
@JamesCameron Can you elaborate? What's the solution then? – CMCDragonkai Nov 14 '13 at 17:31
@CMCDragonkai The solution works fine for all ranges without the writefunction. As you can see in line 8, the resulting data is extracted from the beginning of the chunk, not keeping in mind that in other ranges there is some content in front of the actual data that needs to be ignored. – James Cameron Nov 14 '13 at 18:59
When a write function returns a value other than expected, curl [considers it an error](http://curl.haxx.se/libcurl/c/CURLOPT_WRITEFUNCTION.html) and returns curlerrno: 23 - Failed writing body (18446744073709551615 != 500). There's no way to get around the error, but you can program your code to ignore it. – humbads Dec 03 '15 at 17:35

score 22 · Answer 2 · edited Jan 24 '14 at 02:42

22

Get the first 100 bytes of a document:

curl -r 0-99 http://www.get.this

from the manual

make sure you have a modern curl

edited Jan 24 '14 at 02:42

Zombo

1
62
391
407

answered Jan 09 '10 at 09:36

SpliFF

38,186
16
91
120

3

You are correct, however I found that it's not always reliable and depends on the server and not curl itself. In the misbehaving cases, curl would just keep downloading. – Artem Russakovskii Jan 09 '10 at 09:40
I'm unable to download when I use a different range, for example 100-200. I get "curl error(18)". Can this be solved? – akashrajkn Nov 10 '15 at 03:12

score 2 · Answer 3 · answered Nov 03 '16 at 17:58

Thanks for the nice solution VolkerK. However I needed to use this code as a function, so here's what I came up with. I hope it's useful for others. The main difference is use ($limit, &$datadump) so a limit can be passed, and using the by-reference variable $datadump to be able to return it as a result. I also added CURLOPT_USERAGENT because some websites won't allow access without a user-agent header.

Check http://php.net/manual/en/functions.anonymous.php

function curl_get_contents_partial($url, $limit) {
  $writefn = function($ch, $chunk) use ($limit, &$datadump) { 
    static $data = '';

    $len = strlen($data) + strlen($chunk);
    if ($len >= $limit) {
      $data .= substr($chunk, 0, $limit - strlen($data));
      $datadump = $data;
      return -1;
    }
    $data .= $chunk;
    return strlen($chunk);
  };

  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
  //curl_setopt($ch, CURLOPT_RANGE, '0-1000'); //not honored by many sites, maybe just remove it altogether.
  curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
  curl_setopt($ch, CURLOPT_WRITEFUNCTION, $writefn);
  $data = curl_exec($ch);
  curl_close($ch);
  return $datadump;
}

usage:
$page = curl_get_contents_partial('http://some.webpage.com', 1000); //read the first 1000 bytes
echo $page // or do whatever with the result.

score 0 · Answer 4 · edited Apr 28 '23 at 19:04

0

This could be your solution (download first 500KB into output.txt)

curl -r 0-511999 http://www.yourwebsite.com > output.txt

while 511999 is 500*1024-1

edited Apr 28 '23 at 19:04

Asaph

159,146
25
197
199

answered Jan 09 '10 at 12:41

amir beygi

1,234
8
13

4

I bet it is `500*1024-1`. – lolbas Apr 14 '18 at 06:03

How to partially download a remote file with cURL?

4 Answers4

Linked

Related