0

I am using python 2.7 with the wget module. https://pypi.python.org/pypi/wget

The URL to download is not responsive sometimes. It can take ages to download and when this happens, wget times out. How can I get wget to never time out or at least catch the timeout?

The code to download is simple.

wget.download(url_download)
guagay_wk
  • 26,337
  • 54
  • 186
  • 295
  • 1
    Any reason you can't just catch the timeout error and loop to try again (ideally with a sleep to backoff, so your code can't accidentally DoS the server). – ShadowRanger Oct 15 '16 at 01:59
  • Oops. Pardon my ignorance. I didn't know I can do that. How do I catch the timeout? I have edited the question accordingly. – guagay_wk Oct 15 '16 at 02:00
  • 2
    I'm not actually familiar with the `wget` PyPI package, so I don't know what it does in your timeout scenario. Presumably you've observed it happening; what did it do? Raise an exception? Return silently while leaving the local file empty, or setting a status code? Whatever it is, detect it, try again. – ShadowRanger Oct 15 '16 at 02:08
  • http://stackoverflow.com/questions/12624133/wget-with-python-time-limit – Simon Oct 15 '16 at 02:11
  • 1
    On further checking, it looks like you can pass `download` a callback that is updated with the status as you go. Otherwise, it's mostly a simple wrapper around [`urllib.urlretrieve`](https://docs.python.org/2/library/urllib.html#urllib.urlretrieve), so you only get the exception it raises (it will raise if there is a `Content-Length` header and the data received is shorter for instance). I see no real indication that it will do anything to timeout, which likely means it's just "whatever the socket library decides". `wget` is a really simple package, it's not designed for complex use cases. – ShadowRanger Oct 15 '16 at 02:15
  • 1
    @Simon: That's for the `wget` command line tool, which shares nothing but a name (and a few superficial display similarities) with the `wget` package, AFAICT. – ShadowRanger Oct 15 '16 at 02:16
  • ShadowRanger , seems like wget is not flexible enough for the job. – guagay_wk Oct 15 '16 at 02:20
  • The accepted answer is, but, the solution furthest down seems more suitable then wget, I don't know, maybe wget is a good solution. But natively supported modules work better. – Simon Oct 15 '16 at 02:21
  • @downshift: Don't encourage use of `shell=True`, particularly if string formatting is involved. It's unnecessarily slow, unsafe, and a potential security hole. You should _always_ use `list` based invocation with the default `shell=False` unless there is a _very_ good reason not to (Hint: You're wrong, 99.999% of the time, there isn't a good reason to do so). `subprocess.call(['wget', '--timeout=0', url_download])` is actually shorter, safer, and faster. – ShadowRanger Oct 15 '16 at 03:37
  • Got it, thanks for the correction @ShadowRanger – chickity china chinese chicken Oct 15 '16 at 04:56

1 Answers1

-1

You could use requests instead:

requests.get(url_download)

if you don't specify a timeout argument it never times out.

Ryan Jay
  • 838
  • 8
  • 9