I have a function that is meant to check if a specific HTTP(S) URL is a redirect and if so return the new location (but not recursively). It uses the requests
library. It looks like this:
try:
response = http_session.head(sent_url, timeout=(1, 1))
if response.is_redirect:
return response.headers["location"]
return sent_url
except requests.exceptions.Timeout:
return sent_url
Here, the URL I am checking is sent_url
. For reference, this is how I create the session:
http_session = requests.Session()
http_adapter = requests.adapters.HTTPAdapter(max_retries=0)
http_session.mount("http://", http_adapter)
http_session.mount("https://", http_adapter)
However, one of the requirements of this program is that this must work for dead links. Based off of this, I set a connection timeout (and read timeout for good measures). After playing around with the values, it still takes about 5-10 seconds for the request to fail with this stacktrace no matter what value I choose. (Maybe relevant: in the browser, it gives DNS_PROBE_POSSIBLE
.)
Now, my problem is: 5-10 seconds is too long to wait for if a link is dead. There are many links that this program needs to check, and I do not want a few dead links to be such a large bottleneck, hence I want to configure this DNS lookup timeout.
I found this post which seems to be relevant (OP wants to increase the timeout, I want to decrease it) however the solution does not seem applicable. I do not know the IP addresses that these URLs point to. In addition, this feature request from years ago seems relevant, but it did not help me further.
So far, the best solution to me seems to just spin up a coroutine for each link/a batch of links and then suck up the timeout asynchronously.
I am on Windows 10, however this code will be deployed on an Ubuntu server. Both use Python 3.8.
So, how can I best give my HTTP requests a very low DNS resolution timeout in the case that it is being fed a dead link?