1

I have a list of filenames that exist in a directory online. What is the best way to download them all? For example I want to get the following files:

516d0f278f14d6a2fd2d99d326bed18b.jpg
b09de91688d13a1c45dda8756dadc8e6.jpg
366f737007417ea3aaafc5826aefe490.jpg

from the following directory:

http://media.shopatron.com/media/mfg/10079/product_image/

Maybe something like this:

$var = filelist.txt
for ( $i in $var ) {
    wget http://media.shopatron.com/media/mfg/10079/product_image/$i
}

Any ideas?

tacudtap
  • 120
  • 1
  • 2
  • 12
  • Possible duplicate: http://stackoverflow.com/questions/15436388/download-multiple-images-from-remote-server-with-php-a-lot-of-images The answer there seems to apply here, too. – frnhr Oct 31 '13 at 14:39

3 Answers3

0
$list = file_get_contents('path/to/filelist.txt');
$files = explode("\n", $list); ## Explode around new-line.
foreach ($files as $file) {
   file_put_contents('new_filename.jpg', file_get_contents('http://url/to/file/' . $file));
}

Basically you explode the list around the new-line to get each row, and then file_put_contents the file right after the server downloads it from wherever you are getting them from.

David
  • 3,831
  • 2
  • 28
  • 38
  • `file_get_contents()` might not work on some (or "many") hosts, see: http://stackoverflow.com/questions/7794604/file-get-contents-not-working – frnhr Oct 31 '13 at 14:37
0
$files = file('filelist.txt');  //this will load all lines in the file into an array            
$dest = '/tmp/';  //your destination dir
$url_base = 'http://media.shopatron.com/media/mfg/10079/product_image/';

foreach($files as $f) {
   file_put_contents($dest.$f, file_get_contents($url_base.$f));
}

Pretty self-explanatory, but one point: if you're unsure of filelist.txt's contents, you should clean the filenames.

iamdev
  • 716
  • 4
  • 13
  • In response to Pat's question below.. Both version are [IO bound](http://en.wikipedia.org/wiki/I/O_bound) so speed won't be very different. However file_get_contents is superior to your wget solution for several reasons: 1) Running exec introduces big security risks and is disabled on some systems, 2) some servers don't have wget installed. – iamdev Oct 31 '13 at 17:38
  • I see, but according to http://stackoverflow.com/a/7794628/2097294 using `file_get_contents` also may not be enabled on some servers, right? – tacudtap Oct 31 '13 at 17:55
  • You're right. Security risk is the bigger issue. And while there are risks with both fopen and exec, one generally wants to minimize use of the commands that allows access to run arbitrary commands on your OS (exec, system, passthru). For that reason, exec is usually a last resort for php coders. – iamdev Oct 31 '13 at 18:40
  • OK, I understand now but I just tried your code above and it is not working for me. :-( – tacudtap Oct 31 '13 at 19:00
  • 1. The way this code works, the filelist.txt needs to have each filename be a separate line in the file. 2. filelist.txt file needs to be in the same folder as the script, 3, the filelist.txt names need to have no extra white-space, and 4. To get help, it would help if you shared the error you got – iamdev Nov 01 '13 at 04:06
0

Here is what I came up with while waiting for answers.

<?php
$handle = @fopen("inputfile.txt", "r");
if ($handle) {
    while (($buffer = fgets($handle)) !== false) {
        exec("wget http://media.shopatron.com/media/mfg/10079/product_image/$buffer");
        echo "File ( $buffer) downloaded!<br>";
    }
    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

I got this by modifying the example from the PHP fgets man page. I also set max_execution_time = 0 (unlimited).

If someone can prove their method is more efficient I will gladly mark their answer as accepted. Thank you all for your answers!

tacudtap
  • 120
  • 1
  • 2
  • 12