1

I have a list of around 6000 URL's and I have to check for each URL if its response code is 200 (or) not. By using a normal request.get() using urllib, it's taking a lot of time as its loading the entire HTML page and gives back the response code.

Is there any way to just get the code of the URL just not entirely loading the web page in the backend?

Mahesh
  • 1,117
  • 2
  • 23
  • 42
  • 1
    You can use the `requests.head` method to fetch only the headers for the URL which should reduce the data being fetched. The W3 HTTP protocol page has more details regarding this. https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.4 – JRajan Jan 13 '20 at 08:41
  • In addition to sending only a HEAD request, you may want to parallelize your code using either [multiprocessing](https://docs.python.org/3.8/library/multiprocessing.html#module-multiprocessing) or [asyncio](https://skipperkongen.dk/2016/09/09/easy-parallel-http-requests-with-python-and-asyncio/). – bruno desthuilliers Jan 13 '20 at 09:43

2 Answers2

1

You should just ping them:

import os
hosts = [
    'google.com', 
    ...
]

for host in hosts:
    response = os.system(f"ping -c 1 {host}")
    if response == 0:
        print('host is up')
    else:
        print('host is down')
Lord Elrond
  • 13,430
  • 7
  • 40
  • 80
  • @Monica, for "https://careers.microsoft.com/", the response code is 512 which is unassigned but the URL should give 200. Can you please tell fix on this – Mahesh Jan 13 '20 at 09:05
  • 1
    "ping" will only tel you if the host is reachable AND responds to ICMP (__not__ HTTP) ping requests. – bruno desthuilliers Jan 13 '20 at 09:33
1

You can use the HEAD method to fetch only the header details of URL. It would look something like this:

from urllib.request import Request, urlopen

urls = [
       'http://google.com',
]

def custom_get_method():
    return 'HEAD'

for url in urls:
    req = Request(url)
    req.get_method = custom_get_method
    res = urlopen(req)
    if res.status == 200:
       print("Up")
    else:
       print("Down")

The code has been adapted from https://stackoverflow.com/a/4421485/690576

JRajan
  • 672
  • 4
  • 19