2

I'm developing a leecher website using PHP and cURL.

Here is the basic code:

$ch = curl_init();

$url = "http://somesite.com/somefile.part1.rar";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);

$file_name = basename($url);

$output = curl_exec($ch);

curl_close($ch);

file_put_contents($file_name, $output);

When the file size is small (like 15MB or so) this code works and the file is leeched to my server but when the file size is big (like 1GB or so) nothing is working.

I've tried setting 10000M file size limit for: post_max_size upload_max_filesize max_file_uploads but that didn't work.

I've tried to increase the memory limit up to 512M or even -1, but that didn't work ether.

So how can I fetch large files using cURL?

Arash Naderi
  • 67
  • 2
  • 10
  • 1
    512MB ram, you must have at least 1GB ram to store the file contents in a php variable and then save, you should perhaps use CURLOPT_FILE and write directly to file instead. The script may also be timing out, after 60 seconds. – Lawrence Cherone Dec 16 '17 at 10:46
  • [Guzzle](http://docs.guzzlephp.org/en/latest/request-options.html#sink) it `$client->request('GET', 'http://...', ['sink' => '/path/to/file']);` – Lawrence Cherone Dec 16 '17 at 11:17

1 Answers1

3

what do you think this line does? curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); - it tells php to catch all stdout output of curl_exec, and save it all in memory at once, before doing anything else, that's both a very slow approach (because you don't start writing to disk before your download is 100% complete, and unless you're running on SSDs, disks are slow), and extremely memory hungry approach (because you store the entire file in memory at once), neither of those things are desirable. instead, do $fp=fopen(basename($url),'wb');curl_setopt($ch,CURLOPT_FILE,$fp); - now curl will write the content directly to the disk, thus being much faster (writing it to disk as it's being downloaded) AND just use a small amount of ram, no matter how big the download file is.

  • also note, if you're going to run large amount of slow downloads simultaneously, PHP-behind-a-webserver is simply a bad tool for the job, usually the amount of concurrent php processes you can run is very limited, and block your entire website from loading when all of them are busy, and php aborts if the client disconnect for some reason (see ignore_user_abort()), and many webservers will timeout if the script takes too long (see nginx proxy_read_timeout for example), and php often even kill itself for timeout reasons (see set_time_limit()) .. if that's the case, consider writing the downloader in another language (for example, Go's goroutines should be able to do a massive amount of concurrent slow downloads with little resource usage, unlike PHP)
hanshenrik
  • 19,904
  • 4
  • 43
  • 89
  • Thank you. I did what you said and it worked! Could you please tell me from where and what books you learned these? I'm faaaaar beyond in working with files and downloading and uploading files and i'm working on a roject for a leecher site. – Arash Naderi Dec 16 '17 at 15:47
  • @ArashNaderi i don't really read books, and i don't remember (2006ish, php curl docs, i think?). btw, do you have a name for the project? – hanshenrik Dec 17 '17 at 03:07
  • Thank you. I'm going to name it 'udu' for 'you download uplaod' – Arash Naderi Dec 17 '17 at 05:10
  • One other question: When I download a file with 1GB of size, the time to complete downloading would be 3 minutes and 20 seconds but with IDM, the same file is downloaded around 30 seconds. How can I achieve downloading files in shorter time? like parallel downloading. – Arash Naderi Dec 17 '17 at 09:10
  • @ArashNaderi downloading with many connections in parallel is another thing PHP is very bad at. PHP *CAN* do it, with the [socket api](http://php.net/manual/en/book.sockets.php) and alternatively the [curl_multi api](http://php.net/manual/en/function.curl-multi-init.php), but the socket api is very difficult to use, and the curl_multi api use way more cpu than it should have been using (even with proper usage of curl_multi_select & co - not sure why) you'd be much better of using another tool than PHP for that job, maybe mget or wget2 or libtorrent - but personally i would probably use Go-curl – hanshenrik Dec 17 '17 at 15:49
  • @ArashNaderi another thing, many things download faster with compression, to use compression with curl, set CURLOPT_ENCODING to emptystring (`curl_setopt($ch,CURLOPT_ENCODING,"");`) that speeds up many downloads (by making curl downloading them compressed, if supported by the server - and the vast majority of web servers support it) – hanshenrik Dec 17 '17 at 16:14
  • Thank you so much! You're very good at file processing. I've used some custom php scripts for leeching and those scripts were fast at file transferring. In a test that I had today, uploading the same file with my script versus a custom script written in php had a result of 4 minutes difference transfer time! and I'm looking for the techniques they are using! but unfortunately I don't have a reference or 101 guide. – Arash Naderi Dec 17 '17 at 18:44
  • and about Go-curl: could you please help me with finding a 101 guide or crash course for using it? actually I haven't used Go at all. Thank you again. – Arash Naderi Dec 17 '17 at 18:46
  • @ArashNaderi no, i don't. btw, do you have a place to discuss this project you could link to? (a discord channel? an irc channel? a forum? something like that?), i am somewhat intrigued – hanshenrik Dec 21 '17 at 12:36
  • unfortunately I don't have a discussion board for my project. I'm working on it using my VPS. and now I'm learning Go! – Arash Naderi Dec 24 '17 at 07:28
  • @ArashNaderi make a stand-alone download daemon in Go, it's well suited for that. feel free to keep making the website in PHP, because PHP is very good at that. – hanshenrik Dec 24 '17 at 13:17