1

I am trying to catch an unresolved url given to a urllib request.

import urllib.request

def getSite(url):
    try:
        with urllib.request.urlopen(url, timeout=2) as r:
            print(url, "was resolved!")
    except:
        print(url, "wasn't resolved...")
    return

I would expect this to attempt a connection to the url and if there is no response in 2 seconds it throws the error and prints out that it isn't resolved. If it resolves in under 2 seconds, it response accordingly quickly. This is what I want to happen. I'd like each request to not last more than the time I prescribe.

As it stands, using a valid url provides a speedy response:

> getSite('http://stackoverflow.com')

> http://stackoverflow.com was resolved!
    real    0m0.449s
    user    0m0.063s
    sys     0m0.063s

However, using an invalid url takes much longer than 2 seconds:

> getSite('http://thisisntarealwebaddress.com')

> http://thisisntarealwebaddress.com wasn't resolved...
    real    0m18.605s
    user    0m0.063s
    sys     0m0.047s

What is the timeout parameter really doing, and how can I get the results I want?

Docs: https://docs.python.org/3.1/library/urllib.request.html

pirt
  • 1,153
  • 13
  • 21

2 Answers2

0

I solved this by using the run_with_limited_time_function in this answer and running my function like

run_with_limited_time_function(getSite, (url, ), {}, 2)

I'd still like to hear what others have to say about why timeout doesn't work the way I expect, though!


Copied here for sanity:

def run_with_limited_time(func, args, kwargs, time):
    """Runs a function with time limit

    :param func: The function to run
    :param args: The functions args, given as tuple
    :param kwargs: The functions keywords, given as dict
    :param time: The time limit in seconds
    :return: True if the function ended successfully. False if it was terminated.
    """
    p = Process(target=func, args=args, kwargs=kwargs)
    p.start()
    p.join(time)
    if p.is_alive():
        p.terminate()
        return False

    return True
Community
  • 1
  • 1
pirt
  • 1,153
  • 13
  • 21
  • I'm not marking this as the accepted answer because it doesn't address the first part of my question and I want to hear from others. – pirt Sep 01 '16 at 21:47
0

Just add a timeout option to the urlopen function (it waits for 10 seconds in the example)

file = urllib.request.urlopen("http://www.test.com/resume.pdf", timeout=10)