0

I know that it is possible to check if a URL redirects, as mentioned in the following question and its answer.

How to check if the url redirect to another url using Python

using the following code:

eq = urllib2.Request(url=url, headers=headers)
resp = urllib2.urlopen(req, timeout=3)
redirected = resp.geturl() != url # redirected will be a boolean True/False

However, I have list of Millions of URLs. Currently it is discussed wether one of them is a harmful URL or redirects to a harmful URL.

I want to know if it is possible to check for redirect without opening a direct connection to the redirecting website to avoid creating a connection with a harmful website?

Kev1n91
  • 3,553
  • 8
  • 46
  • 96
  • Do you mean "connect to original website but not connect to the redirected website", or "not even create a connection at all"? The later one is impossible. – Sraw Jun 29 '18 at 09:43
  • _“to avoid creating a connection with a harmful website”_ - why? In what way to do imagine any “harmful website” could actually do any damage to your python script? – CBroe Jun 29 '18 at 09:44
  • Some websites will download automatically binary code to your pc upon creating a connection @Cbroe – Kev1n91 Jun 29 '18 at 09:46
  • @Sraw - the first one , I will edit the question – Kev1n91 Jun 29 '18 at 09:46
  • The only way to do this would be to connect to every URL, and check whether it redirects. You can check this without connecting to the other website. Redirects are most often done through the 3xx headers. Of course, the javascript on the website may also perform a redirect, but this would be harder to detect without just running it. – Quaisaq Anderson Jun 29 '18 at 09:46
  • @QuaisaqAnderson so just for confirmation, that's what the above code would do - \wo connection to the redirecting url? – Kev1n91 Jun 29 '18 at 09:48
  • _“Some websites will download automatically binary code to your pc upon creating a connection”_ - aha … and how do they do that? Anything that relies on some security flaw in your browser is probably not going to be very effective in the context of your python script, don’t you think? Merely requesting a URL is not going to “execute” anything on your end, unless you specifically implement something to that effect, neither will any external resources embedded by that URL be downloaded automatically. – CBroe Jun 29 '18 at 09:55

1 Answers1

1

You can do a HEAD request and check the status code. If you are using the third party requests library you can do that like this:

import requests

original_url = '...'  # your original url here
response = requests.head(original_url)

if response.is_redirect:
    print("Redirecting")
else:
    print("Not redirecting")
dopstar
  • 1,478
  • 10
  • 20