0

I have a list of 10^6 url I want to check against the status code. The things is the requests.get is too slow for me with timeout specified and sometimes I can not be sure if url is valid or not even with 1 second timeout (let's say server response is slow).

So, currently I do:

import request

url = "https://dupa.ucho.elo.8"
r = requests.get(url, headers={'Connection': 'close'}, timeout=1)

How to quickly check if url is valid or not without setting timeout and instantly return response for invalid URLs?

  • Note1: I want to avoid grequests module.
  • Note2: I do not want to use multithreading.
  • I have read this https://stackoverflow.com/questions/17782142/why-doesnt-requests-get-return-what-is-the-default-timeout-that-requests-geta but it involves timeout set.
Dariusz Krynicki
  • 2,544
  • 1
  • 22
  • 47

1 Answers1

0

While this might not give you lightning speed due to avoiding multithreading, you can check whether the response of the URL contains what you want to see (200 status code) and terminate it right after.

import requests
import sys

url_list = ['http://google12121.com/','https://google.com/']

for url in url_list:
    try:
        response = requests.get(url)
        if "200" in str(response.status_code):
            print("Yes")
        else:
            print("No")
    except:
        print("Error: "+str(sys.exc_info()[0]))
        continue

You might want to write a more specific error catching logic because generally catching all errors is bad.

isopach
  • 1,783
  • 7
  • 31
  • 43