311

This is simplest example running wget:

wget http://www.example.com/images/misc/pic.png

but how to make wget skip download if pic.pngis already available?

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
nais inpoh gan
  • 3,159
  • 2
  • 15
  • 5

6 Answers6

391

Try the following parameter:

-nc, --no-clobber: skip downloads that would download to existing files.

Sample usage:

wget -nc http://example.com/pic.png
kenorb
  • 155,785
  • 88
  • 678
  • 743
plundra
  • 18,542
  • 3
  • 33
  • 27
  • 9
    As noted on the linked question, I disagree - If no-clobber is used and the filename exists it exits. No HEAD request even. Even if this wasn't the case, check if you have a file to begin with :-) `[ ! -e "$(basename $URL)" ] && wget $URL` – plundra Oct 21 '15 at 11:56
  • 4
    I think I may be getting different results because I'm using the `--recursive` option. – ma11hew28 Oct 22 '15 at 01:48
  • 3
    Great answer! Going to disagree with ma11hew28. I just tested this on a list of 3,000 URL's with GNU Wget 1.14 and `wget -nc -i list.txt`. Don't think it's possible for a server to crawl 3k links in a tenth of a second! – HoldOffHunger May 13 '21 at 12:00
  • 2
    Additionally, `-N, --timestamping` says `don't re-retrieve files unless newer than local` if you are looking to sync, in-case some remote files might ACTUALLY be worth re-downloading (edit: I see another answer now that says the same). – bunkerdive Aug 30 '21 at 20:13
279

The -nc, --no-clobber option isn't the best solution as newer files will not be downloaded. One should use -N instead which will download and overwrite the file only if the server has a newer version, so the correct answer is:

wget -N http://www.example.com/images/misc/pic.png

Then running Wget with -N, with or without -r or -p, the decision as to whether or not to download a newer copy of a file depends on the local and remote timestamp and size of the file. -nc may not be specified at the same time as -N.

-N, --timestamping: Turn on time-stamping.

sdgfsdh
  • 33,689
  • 26
  • 132
  • 245
Daniel Sokolowski
  • 11,982
  • 4
  • 69
  • 55
  • 61
    When server is not configured properly `-N` may fail and wget will always redownload. So sometimes `-nc` is better solution. – user Feb 23 '14 at 18:43
  • 2
    what could be the applicable scenario where 'When server is not configured properly' would occur? – AjayKumarBasuthkar Jul 17 '15 at 13:56
  • 2
    when you are downloading from a location that was copied, changing all the timestamps. – Robert Oct 28 '16 at 16:22
  • 1
    Whether this is best depends on context. For example, I'm downloading ~1600 files from a list, and then updated the list to include some more files. The files don't change so I don't care about the latest version and I don't want it to check the server for new versions of the 1600 files that I already have. – JBentley Oct 03 '17 at 19:45
  • 4
    @AjayKumarBasuthkar: When the server doesn't support any way of checking for newer file, `wget` will complain `Last-modified header missing`; this is exactly the situation outlined. – Piskvor left the building Feb 21 '18 at 09:58
  • If download failure are possible -N should be avoided. wget only applies timestamps when downloads are complete, so failed downloads end up with current timestamps always newer than on the server, and causing those downloads to remain in a partial state until a future update on the server. – Kenneth M. Kolano Nov 28 '18 at 19:24
  • Beware that `-O` always creates a new file with a current timestamp and hence `-N` and `-O` exclude each other. – Suuuehgi Jun 16 '21 at 16:14
  • While all problems mentioned with the `-N` option are real, the use of `-nc` does not solve them. If something goes bad as described above (partial download, incorrect time stamp) the file will exist locally, so `-nc` will force `wget` will skip them. – Kostas Oct 15 '21 at 12:14
  • So in other words, wget is incapable of solving this issue reliably. It would be just better to use aria2 when dealing with large file downloads. No fuss with that, as it detects same files by default too. Example code: `aria2c --file-allocation=none -c -x 16 -s 16 --log-level=warn --summary-interval=1 {File_URL} -d {Directory} {Filename}` – Testerhood Jan 25 '23 at 13:47
34

The answer I was looking for is at https://unix.stackexchange.com/a/9557/114862.

Using the -c flag when the local file is of greater or equal size to the server version will avoid re-downloading.

jsta
  • 3,216
  • 25
  • 35
  • 2
    This is especially great when you are downloading a bunch of files with the -i flag. `wget -i filelist.txt -c` will resume a failed download of a list of files. – Trevor Sep 06 '18 at 04:30
  • 2
    I am downloading from a server which provides neither the Length header nor the Last-modified header (mentioned elsewhere on this page). So, I'd like to check *only* if a file with the same name exists on the disk and skip the re-download if it does. Still looking for that solution. – daveloyall Sep 11 '20 at 22:27
  • 4
    `-c` means `continue`. If the file is was changed to a bigger file with different content you get will start download at the end of the local file and add the new file contents. You may end up garbage. – Soerendip May 04 '21 at 22:23
27

When running Wget with -r or -p, but without -N, -nd, or -nc, re-downloading a file will result in the new copy simply overwriting the old.

So adding -nc will prevent this behavior, instead causing the original version to be preserved and any newer copies on the server to be ignored.

See more info at GNU.

kenorb
  • 155,785
  • 88
  • 678
  • 743
Mahesh
  • 34,573
  • 20
  • 89
  • 115
  • The following doesn't work as it is not recursive. The already downloaded files should be parsed for links so that everything that's missing is downloaded: wget -w 10 -r -nc -l inf --no-remove-listing -H "" – Terje Oseberg Jan 18 '23 at 15:32
8
-nc, --no-clobber

If a file is downloaded more than once in the same directory, wget's behavior depends on a few options, including -nc. In certain cases, the local file is "clobbered" (overwritten), upon repeated download. In other cases, it is preserved.

When running wget without -N, -nc, or -r, downloading the same file in the same directory results in the original copy of file being preserved and the second copy being named file.1. If that file is downloaded yet again, the third copy is named file.2, etc. When -nc is specified, this behavior is suppressed, and wget refuses to download newer copies of file. Therefore, "no-clobber" is a misnomer in this mode: it's not clobbering that's prevented (as the numeric suffixes were already preventing clobbering), but rather the multiple version saving that's being turned off.

When running wget with -r, but without -N or -nc, re-downloading a file results in the new copy overwriting the old. Adding -nc prevents this behavior, instead causing the original version to be preserved and any newer copies on the server to be ignored.

When running wget with -N, with or without -r, the decision as to whether or not to download a newer copy of a file depends on the local and remote timestamp and size of the file. -nc may not be specified at the same time as -N.

Note that when -nc is specified, files with the suffixes .html or .htm are loaded from the local disk and parsed as if they had been retrieved from the web.

Engr Ali
  • 409
  • 1
  • 5
  • 13
2

I had issues with -N as I wanted to save output to a different file name.

Timestamping, wget docs:

A file is considered new if one of these two conditions are met:

  1. A file of that name does not already exist locally.
  2. A file of that name does exist, but the remote file was modified more recently than the local file.

Using test:

test -f stackoverflow.html || wget -O stackoverflow.html https://stackoverflow.com/

If the file exists does not exist test will evaluate to FALSE so wget will be executed.

rdmolony
  • 601
  • 1
  • 7
  • 15