97

I’m looking for a quick way to get an HTTP response code from a URL (i.e. 200, 404, etc). I’m not sure which library to use.

alexwlchan
  • 5,699
  • 7
  • 38
  • 49

8 Answers8

129

Update using the wonderful requests library. Note we are using the HEAD request, which should happen more quickly then a full GET or POST request.

import requests
try:
    r = requests.head("https://stackoverflow.com")
    print(r.status_code)
    # prints the int of the status code. Find more at httpstatusrappers.com :)
except requests.ConnectionError:
    print("failed to connect")
Gourneau
  • 12,660
  • 8
  • 42
  • 42
  • 2
    requests is much better than urllib2, for such a link: http://www.dianping.com/promo/208721#mod=4, urllib2 give me a 404 and requests give a 200 just as what I get from a browser. – WKPlus Dec 18 '13 at 12:32
  • 6
    httpstatusrappers.com...awesome!! My code is on that Lil Jon status, son! – tmthyjames Dec 03 '14 at 02:33
  • 1
    This is the best solution. Much better than any of the others. – Awn Apr 25 '15 at 07:58
  • @WKPlus for the record, now `requests` gives `403` for your link, although it's still working in browser. – Dennis Golomazov Mar 28 '17 at 23:02
  • If you track the actual browser requests @DennisGolomazov, you'll see "http://..." for stackoverflow redirects you to secure, "https://..." equivalent, and the browser follows that 403 redirect instruction. So actually, they're exactly the same from both places, it's just that the browser goes ahead and forwards you to the redirect location. – seaders May 07 '18 at 12:24
  • 2
    @Gourneau Ha! That wasn't what I intended with my comment, I think it was perfectly fine, and in this context, people should try understand why it "Just works" in the browser, but returns a 403 in code, when in actuality, the same thing's happening both places. – seaders May 08 '18 at 01:36
65

Here's a solution that uses httplib instead.

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("stackoverflow.com") # prints 200
print get_status_code("stackoverflow.com", "/nonexistant") # prints 404
Evan Fosmark
  • 98,895
  • 36
  • 105
  • 117
  • 14
    +1 for HEAD request — no need to retrieve the entire entity for a status check. – Ben Blank Jul 16 '09 at 23:46
  • 7
    Although you really should restrict that `except` block to at least `StandardError` so that you don't incorrectly catch things like `KeyboardInterrupt`. – Ben Blank Jul 16 '09 at 23:47
  • 3
    I was wondering if HEAD requests are reliable. Because websites might not have (properly) implemented the HEAD method, which could result in status codes like 404, 501 or 500. Or am I being paranoid? – Blaise Dec 05 '12 at 10:58
  • 2
    How would one make this follow 301s ? – Randall Hunt Aug 27 '13 at 16:13
  • 2
    @Blaise If a website doesn't allow HEAD requests then performing a HEAD request *should* result in a 405 error. For an example of this, try running `curl -I http://www.amazon.com/`. – Nick Jul 22 '14 at 03:12
  • It's not the **_web site_**, but rather the server to which you should be referring (@Blaise too). And I doubt that any major server would not honour A `HEAD` request. – Mawg says reinstate Monica Mar 17 '16 at 15:19
  • Some simply closes the connection when receiving a `HEAD` request. The web server most likely supports it but firewalls may not like it. – sdaffa23fdsf Mar 27 '17 at 20:00
  • Pinging an S3 url that in glacier gives a 200 on HEAD and 403 on GET (which makes sense as file is there but not) so buyer beware! – Alexandre G Aug 14 '22 at 11:53
26

You should use urllib2, like this:

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]
RichieHindle
  • 272,464
  • 47
  • 358
  • 399
  • 3
    This is not a valid solution because urllib2 will follow redirects, so you will not get any 3xx responses. – sorin Jan 31 '13 at 12:35
  • 1
    @sorin: That depends - you might well **want** to follow redirects. Perhaps you want to ask the question "If I were to visit this URL with a browser, would it show content or give an error?" In that case, if I changed `http://entrian.com/` to `http://entrian.com/blog` in my example, the resulting 200 would be correct even though it involved a redirect to `http://entrian.com/blog/` (note the trailing slash). – RichieHindle Jan 31 '13 at 14:12
9

In future, for those that use python3 and later, here's another code to find response code.

import urllib.request

def getResponseCode(url):
    conn = urllib.request.urlopen(url)
    return conn.getcode()
nickanor
  • 637
  • 2
  • 12
  • 18
3

The urllib2.HTTPError exception does not contain a getcode() method. Use the code attribute instead.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
2

Addressing @Niklas R's comment to @nickanor's answer:

from urllib.error import HTTPError
import urllib.request

def getResponseCode(url):
    try:
        conn = urllib.request.urlopen(url)
        return conn.getcode()
    except HTTPError as e:
        return e.code
E L
  • 318
  • 4
  • 8
1

It depends on multiple factories, but try to test these methods:

import requests

def url_code_status(url):
    try:
        response = requests.head(url, allow_redirects=False)
        return response.status_code
    except Exception as e:
        print(f'[ERROR]: {e}')

or:

import http.client as httplib
import urllib.parse

def url_code_status(url):
    try:
        protocol, host, path, query, fragment = urllib.parse.urlsplit(url)
        if protocol == "http":
            conntype = httplib.HTTPConnection
        elif protocol == "https":
            conntype = httplib.HTTPSConnection
        else:
            raise ValueError("unsupported protocol: " + protocol)
        conn = conntype(host)
        conn.request("HEAD", path)
        resp = conn.getresponse()
        conn.close()
        return resp.status
    except Exception as e:
        print(f'[ERROR]: {e}')

Benchmark results for 100 URLs:

  • First method: 20.90 seconds
  • Second method: 23.15 seconds
Mahrez BenHamad
  • 1,791
  • 1
  • 15
  • 21
0

Here's an httplib solution that behaves like urllib2. You can just give it a URL and it just works. No need to mess about splitting up your URLs into hostname and path. This function already does that.

import httplib
import socket
def get_link_status(url):
  """
    Gets the HTTP status of the url or returns an error associated with it.  Always returns a string.
  """
  https=False
  url=re.sub(r'(.*)#.*$',r'\1',url)
  url=url.split('/',3)
  if len(url) > 3:
    path='/'+url[3]
  else:
    path='/'
  if url[0] == 'http:':
    port=80
  elif url[0] == 'https:':
    port=443
    https=True
  if ':' in url[2]:
    host=url[2].split(':')[0]
    port=url[2].split(':')[1]
  else:
    host=url[2]
  try:
    headers={'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
             'Host':host
             }
    if https:
      conn=httplib.HTTPSConnection(host=host,port=port,timeout=10)
    else:
      conn=httplib.HTTPConnection(host=host,port=port,timeout=10)
    conn.request(method="HEAD",url=path,headers=headers)
    response=str(conn.getresponse().status)
    conn.close()
  except socket.gaierror,e:
    response="Socket Error (%d): %s" % (e[0],e[1])
  except StandardError,e:
    if hasattr(e,'getcode') and len(e.getcode()) > 0:
      response=str(e.getcode())
    if hasattr(e, 'message') and len(e.message) > 0:
      response=str(e.message)
    elif hasattr(e, 'msg') and len(e.msg) > 0:
      response=str(e.msg)
    elif type('') == type(e):
      response=e
    else:
      response="Exception occurred without a good error message.  Manually check the URL to see the status.  If it is believed this URL is 100% good then file a issue for a potential bug."
  return response
Sam Gleske
  • 950
  • 7
  • 19
  • 1
    Not sure why this was downvoted without feedback. It works with HTTP and HTTPS URLs. It uses the HEAD method of HTTP. – Sam Gleske Jul 07 '16 at 07:58