Is it possible to partially download a remote file with cURL? Let's say, the actual filesize of the remote file is 1000 KB. How can I download only first 500 KB of it?
4 Answers
You can also set the range header parameter with the php-curl extension.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.spiegel.de/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
echo $result;
But as noted before if the server doesn't honor this header but sends the whole file curl will download all of it. E.g. http://www.php.net ignores the header. But you can (in addition) set a write function callback and abort the request when more data is received, e.g.
// php 5.3+ only
// use function writefn($ch, $chunk) { ... } for earlier versions
$writefn = function($ch, $chunk) {
static $data='';
static $limit = 500; // 500 bytes, it's only a test
$len = strlen($data) + strlen($chunk);
if ($len >= $limit ) {
$data .= substr($chunk, 0, $limit-strlen($data));
echo strlen($data) , ' ', $data;
return -1;
}
$data .= $chunk;
return strlen($chunk);
};
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.php.net/');
curl_setopt($ch, CURLOPT_RANGE, '0-500');
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, $writefn);
$result = curl_exec($ch);
curl_close($ch);

- 95,432
- 20
- 163
- 226
-
+1 for good answer. This works because $writefn() returns -1 when the limit is reached. When the callback function returns anything other than the number of bytes passed to it (in $chunk), curl aborts the connection. – GZipp Jan 09 '10 at 16:09
-
Excellent. When I was doing the same in Perl, I had to use an alarm that triggered and checked for the file size, for the lack of a better method. Very hacky but it worked. – Artem Russakovskii Jan 09 '10 at 20:03
-
Just what I needed. However, worth mentioning is also the CURLOPT_BUFFERSIZE which defines the 'chunk' size. So if your buffer is very big, the chunk size might easily contain a full web page of data anyway (I think!) – Tom Carnell Jun 11 '13 at 06:09
-
-1, as this gives invalid results for other ranges (over 0). Your answer gives the impression that this works for all ranges and also for servers that don't support partial content. Nevertheless, when servers that ignore the range and send the whole content, only the first ranges get processed, independent from the set range. To get the right results, the first bytes till the offset of the range have to be ignored. – James Cameron Sep 29 '13 at 15:51
-
-
@CMCDragonkai The solution works fine for all ranges without the writefunction. As you can see in line 8, the resulting data is extracted from the beginning of the chunk, not keeping in mind that in other ranges there is some content in front of the actual data that needs to be ignored. – James Cameron Nov 14 '13 at 18:59
-
When a write function returns a value other than expected, curl [considers it an error](http://curl.haxx.se/libcurl/c/CURLOPT_WRITEFUNCTION.html) and returns curlerrno: 23 - Failed writing body (18446744073709551615 != 500). There's no way to get around the error, but you can program your code to ignore it. – humbads Dec 03 '15 at 17:35
Get the first 100 bytes of a document:
curl -r 0-99 http://www.get.this
from the manual
make sure you have a modern curl
-
3You are correct, however I found that it's not always reliable and depends on the server and not curl itself. In the misbehaving cases, curl would just keep downloading. – Artem Russakovskii Jan 09 '10 at 09:40
-
I'm unable to download when I use a different range, for example 100-200. I get "curl error(18)". Can this be solved? – akashrajkn Nov 10 '15 at 03:12
Thanks for the nice solution VolkerK. However I needed to use this code as a function, so here's what I came up with. I hope it's useful for others. The main difference is use ($limit, &$datadump) so a limit can be passed, and using the by-reference variable $datadump to be able to return it as a result. I also added CURLOPT_USERAGENT because some websites won't allow access without a user-agent header.
Check http://php.net/manual/en/functions.anonymous.php
function curl_get_contents_partial($url, $limit) {
$writefn = function($ch, $chunk) use ($limit, &$datadump) {
static $data = '';
$len = strlen($data) + strlen($chunk);
if ($len >= $limit) {
$data .= substr($chunk, 0, $limit - strlen($data));
$datadump = $data;
return -1;
}
$data .= $chunk;
return strlen($chunk);
};
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
//curl_setopt($ch, CURLOPT_RANGE, '0-1000'); //not honored by many sites, maybe just remove it altogether.
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_WRITEFUNCTION, $writefn);
$data = curl_exec($ch);
curl_close($ch);
return $datadump;
}
usage:
$page = curl_get_contents_partial('http://some.webpage.com', 1000); //read the first 1000 bytes
echo $page // or do whatever with the result.

- 414
- 7
- 9
This could be your solution (download first 500KB into output.txt)
curl -r 0-511999 http://www.yourwebsite.com > output.txt
- while
511999
is500*1024-1

- 159,146
- 25
- 197
- 199

- 1,234
- 8
- 13