2

I have a file list total 13 million files, about 140GB. NOTE: Need maintain directory structure. Each file is approximately 100KB-2MB.

I use the wget -x -i file_list.txt

this is work will, but too slow.

NOTE: all is url in file list.

My server have 100M bandwidth, should be get 10M/s speed. but wget -x -i file_list.txt just give me 1M/s.

How to fix it?

jason
  • 227
  • 1
  • 4
  • 12

3 Answers3

3

You can use the parallel command:

parallel -a websites.txt --jobs 20 'wget -x'

with -a to read each line from the file, --jobs to say how many to run in parallel, and wget will be suffixed by the next line from the file.

jason
  • 227
  • 1
  • 4
  • 12
1

You could start wget multiple times simultaneously with following flags:

wget -x -N -i file_list.txt &
wget -x -N -i file_list.txt &
wget -x -N -i file_list.txt ...

The N Flag stands for:

-N,  --timestamping              don't re-retrieve files unless newer than

If you still encounter problems, you could try it with -r / -np: multiple wget -r a site simultaneously?

Community
  • 1
  • 1
Johannes
  • 93
  • 8
0

You could possibly increase the performance by creating multiple wget instances. You can use a for loop to do this, but if the remote server is serving at 1Mbps then you are stuck at that speed.

Checkout Parallel wget in Bash for more information on creating multiple downloads.

Community
  • 1
  • 1