10

any command / wget with options?

For multithreaded download a site recursively and simultaneously?

c2h2
  • 11,911
  • 13
  • 48
  • 60

4 Answers4

12

I found a decent solution.

Read original at http://www.linuxquestions.org/questions/linux-networking-3/wget-multi-threaded-downloading-457375/

wget -r -np -N [url] &
wget -r -np -N [url] &
wget -r -np -N [url] &
wget -r -np -N [url] &

copied as many times as you deem fitting to have as much processes downloading. This isn't as elegant as a properly multithreaded app, but it will get the job done with only a slight amount of over head. the key here being the "-N" switch. This means transfer the file only if it is newer than what's on the disk. This will (mostly) prevent each process from downloading the same file a different process already downloaded, but skip the file and download what some other process hasn't downloaded. It uses the time stamp as a means of doing this, hence the slight overhead.

It works great for me and saves a lot of time. Don't have too many processes as this may saturate the web site's connection and tick off the owner. Keep it around a max of 4 or so. However, the number is only limited by CPU and network bandwidth on both ends.

Julian
  • 8,808
  • 8
  • 51
  • 90
4

With the use of parallel wget utilizing the xargs switch, this solution seems so much better:

https://stackoverflow.com/a/11850469/1647809

Community
  • 1
  • 1
sandyp
  • 432
  • 5
  • 14
  • 2
    It is good only when you know all the downloadable URLs in advance. That is not the case when you want to mirror a site. – Ray Jan 03 '18 at 14:49
3

Use axel to download with multi connections

apt-get install axel

axel http://example.com/file.zip
Mohsen
  • 64,437
  • 34
  • 159
  • 186
  • or aget http://www.enderunix.org/aget/ but these aren't recursive solutions (good for other ppl who got to this question looking for one though) – Orwellophile Dec 18 '13 at 08:16
2

Well, you can always run multiple instances of wget, no?

Example:

wget -r http://somesite.example.org/ &
wget -r http://othersite.example.net/ &

etc. This syntax will work in any Unix-like environment (e.g. Linux or MacOS); not sure how to do this in Windows.

Wget itself does not support multithreaded operations - at least, neither the manpage nor its website has any mention of this. Anyway, since wget supports HTTP keepalive, the bottleneck is usually the bandwidth of the connection, not the number of simultaneous downloads.

Piskvor left the building
  • 91,498
  • 46
  • 177
  • 222
  • @c2h2: According to the wget manpage ( http://linux.die.net/man/1/wget ) and wget docs on its website ( http://www.gnu.org/software/wget/manual/wget.html ), there is no such option (or anything similar) - `wget` is single-threaded. Sorry. – Piskvor left the building Jan 20 '11 at 12:53