2

As seen here, max-retries can be set for requests.Session(), but I only need the head.status_code to check if a url is valid and active.

Is there a way to just get the head within a mount session?

import requests
def valid_active_url(url):
    try:
        site_ping = requests.head(url, allow_redirects=True)
    except requests.exceptions.ConnectionError:
        print('Error trying to connect to {}.'.format(url))

    try:
        if (site_ping.status_code < 400):
            return True
        else:
            return False
    except Exception:
        return False
    return False

Based on docs am thinking I need to either:

  • see if the session.mount method results return a status code (which I haven't found yet)
  • roll my own retry method, perhaps with a decorator like this or this or a (less eloquent) loop like this.

In terms of the first approach I have tried:

s = requests.Session()
a = requests.adapters.HTTPAdapter(max_retries=3)
s.mount('http://redirected-domain.com', a)
resp = s.get('http://www.redirected-domain.org')
resp.status_code

Are we only using s.mount() to get in and set max_retries? Seems to be a redundancy, aside from that the http connection would be pre-established.

Also resp.status_code returns 200 where I am expecting a 301 (which is what requests.head returns.

NOTE: resp.ok might be all I need for my purposes here.

MikeiLL
  • 6,282
  • 5
  • 37
  • 68
  • 1
    `.mount` doesn't make a request. *"The mount call registers a specific instance of a Transport Adapter to a prefix. Once mounted, any HTTP request made using that session whose URL starts with the given prefix will use the given Transport Adapter."* – jonrsharpe Mar 12 '19 at 17:14
  • Yea. I guess instead of just letting my eyes glaze over at "Transport Adapter to a prefix", I should look up what that means. – MikeiLL Mar 12 '19 at 17:18

1 Answers1

0

After a mere two hours of developing the question, the answer took five minutes:

def valid_url(url):
    if (url.lower() == 'none') or (url == ''):
        return False
    try:
        s = requests.Session()
        a = requests.adapters.HTTPAdapter(max_retries=5)
        s.mount(url, a)
        resp = s.head(url)
        return resp.ok
    except requests.exceptions.MissingSchema:
        # If it's missing the schema, run again with schema added
        return valid_url('http://' + url)
    except requests.exceptions.ConnectionError:
        print('Error trying to connect to {}.'.format(url))
        return False

Based on this answer it looks like the head request will be slightly less resource intensive than the get, particularly if the url contains a large amount of data.

The requests.adapters.HTTPAdapter is the built in adaptor for the urllib3 library that underlies the Requests library.

On another note, I'm not sure what the correct term or phrase for what I'm checking here is. A url could still be valid if it returns an error code.

MikeiLL
  • 6,282
  • 5
  • 37
  • 68
  • How's that different to *"the first approach I have tried"*? – jonrsharpe Mar 12 '19 at 17:15
  • The first approach didn't have retries in it. I kept getting the connection error on some URLs that were 301 forwards – MikeiLL Mar 12 '19 at 17:16
  • I mean the part where you say what I quoted. I think you'd already answered this by the time you wrote the question, just not wrapped it in error handling! – jonrsharpe Mar 12 '19 at 17:17
  • The only difference is that in the solution I replace `s.get` with `s.head` which I hope is requiring less network activity, but after all that work writing the question, I thought it was worth posting. If you have time to post a more eloquent answer, I'd love to accept it. – MikeiLL Mar 12 '19 at 17:22