6

So I'm downloading files with WGET and I want to check if the file exsists before I download it. I know with the CLI version it has an option to: (see example).

# check if file exsists
# if not, download
wget.download(url, path)

With WGET it downloads the file without needing to name it. This is important because I don't want to rename the files when they already have a name.

If there is an alternative file downloading method that allows for checking for exsisting files please tell me! Thanks!!!

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
aoeu
  • 153
  • 3
  • 15

3 Answers3

3

wget.download() doesn't have any such option. The following workaround should do the trick for you:

import subprocess

url = "https://url/to/index.html"
path = "/path/to/save/your/files"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])

If the file is already there, you will get the following message:

File ‘index.html’ already there; not retrieving.

EDIT: If you are running this on Windows, you'd also have to include shell=True:

subprocess.run(["wget", "-r", "-nc", "-P", path, url], shell=True)
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
  • I get an error: ``` [WinError 2] The system cannot find the file specified ``` – aoeu Apr 04 '19 at 21:33
  • 1
    @641i130 You can use `subprocess.run(["wget", "-r", "-nc", "-P", path, url], shell=True)` if you are running this on Windows. I have edited my answer to include this option for those running Windows. Hope it helps. – Giorgos Myrianthous Apr 04 '19 at 21:41
  • 1
    Thank you! This was very helpful! – aoeu Apr 04 '19 at 21:45
1

I don't see that the python module has that option.

You could try to guess the filename that will be used (typically it will be the part of the url after the last slash character).

Or you could download the file to a new temporary directory and then check if that filename exists in your main directory.

John Gordon
  • 29,573
  • 7
  • 33
  • 58
1

From the source code, the wget.download() function doesn't seem to have the option for additional parameters such as -nc or -N for skipping downloads if the file already exists. Only the CLI version seems to support this.

The function:

def download(url, out=None, bar=bar_adaptive):
    ...

You are only able to choose the url and the output directory

nathancy
  • 42,661
  • 14
  • 115
  • 137