How to detect when a site check redirects to another page using the requests module?

Question

For example, if I go to www.yahoo.com/thispage, and yahoo has set up a filter to redirect /thispage to /thatpage. So whenever someone goes to /thispage, they will land on /thatpage.

If I use httplib/requests/urllib, will it know that there was a redirection? What error pages? Some sites redirect user to /errorpage whenever the page cannot be found.

What is the problem you are trying to solve? How is your code not doing the right thing? If you merely want to know about error modes, test this behaviour yourself. — Marcin, Nov 20 '12 at 21:52
Check http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from-following-a-redirect — OneOfOne, Nov 20 '12 at 21:53
@Marcin I have a huge list(1k+) of urls to test if they are up or not. I randomly chose 40-50 of them to test manually, I see that some are getting redirected to an error page whenever a page cannot be found. Also I see many urls been redirected as well because the url pattern has changed, same names just written differently. — iCodeLikeImDrunk, Nov 20 '12 at 22:01
@OneOfOne that sorta looks like what i need, ill check it out. thanks! — iCodeLikeImDrunk, Nov 20 '12 at 22:02

score 29 · Accepted Answer · edited Jun 09 '22 at 05:02

29

With requests, you get a listing of any redirects in the .history attribute of the response object. It returns a Python list. See the documentation for more.

edited Jun 09 '22 at 05:02

Employee

2,231
3
33
60

answered Nov 20 '12 at 22:03

MikeHunter

4,144
1
19
14

yonilevy · Answer 2 · 2012-11-21T18:13:41.553

19

To prevent requests from following redirects use:

r = requests.get('http://www.yahoo.com/thispage', allow_redirects=False)

If it is in indeed a redirect, you can check the redirect target location in r.headers['location'].

edited Nov 21 '12 at 18:13

answered Nov 20 '12 at 22:06

yonilevy

5,320
3
31
27

score 3 · Answer 3 · answered Nov 25 '14 at 04:44

The accepted answer is the correct first option, but in some cases if the site redirects with a meta tag they also have a canonical link specified once they redirect. In this example let me try to request http://en.wikipedia.org/wiki/Google_Inc_Class_A from wikipedia, which is a url that redirects.

>> request = requests.get('http://en.wikipedia.org/wiki/Google_Inc_Class_A')

I check and:

>> request.history
[]

An alternative is to try and pull the canonical url which should hopefully have what you're been redirected to. (Note I'm using BeautifulSoup here as well)

>> soup = BeautifulSoup(request._content)
>> canonical = soup.find('link', {'rel': 'canonical'})
>> canonical['href']
'http://en.wikipedia.org/wiki/Google'

Which does match the url you get redirected to in this particular case. So to be clear, this is an ugly second option but worth trying if all else fails.

For future readers: I just checked this example and the history is correctly populated: `requests.get('http://en.wikipedia.org/wiki/Google_Inc_Class_A', allow_redirects=True)`. I don't know if it's due to "allow_redirects" parameters or to a new version of requests package. — Alberto Coletta, Jul 20 '16 at 15:41

score 1 · Answer 4 · answered Nov 20 '12 at 22:05

It depends on how they are doing the redirection. The "right" way is to return a redirected HTTP status code (301/302/303). The "wrong" way is to place a refresh meta tag in the HTML.

If they do the former, requests will handle it transparently. Note that any sane error page redirect will still have an error status code (e.g. 404) which you can check as response.status_code.

How to detect when a site check redirects to another page using the requests module?

4 Answers4

Linked