47

I am currently trying to log into a site using Python however the site seems to be sending a cookie and a redirect statement on the same page. Python seems to be following that redirect thus preventing me from reading the cookie send by the login page. How do I prevent Python's urllib (or urllib2) urlopen from following the redirect?

Jack Edmonds
  • 31,931
  • 18
  • 65
  • 77
  • Duplicate: http://stackoverflow.com/questions/110498/is-there-an-easy-way-to-request-a-url-in-python-and-not-follow-redirects/110808 – S.Lott Feb 16 '09 at 20:56
  • a similar question: http://stackoverflow.com/questions/9890815/python-get-headers-only-using-urllib2 – newtover Mar 28 '12 at 11:28
  • For readers who don't care about using urllib specificially. `requests` supports this "out of the box" https://stackoverflow.com/questions/110498/is-there-an-easy-way-to-request-a-url-in-python-and-not-follow-redirects – Att Righ May 17 '22 at 10:17

4 Answers4

33

You could do a couple of things:

  1. Build your own HTTPRedirectHandler that intercepts each redirect
  2. Create an instance of HTTPCookieProcessor and install that opener so that you have access to the cookiejar.

This is a quick little thing that shows both

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar
pope
  • 628
  • 4
  • 7
  • You don't seem to be using `redirect_handler = urllib2.HTTPRedirectHandler()` in the example at all. Were you going to show a second example? – Ehtesh Choudhury Aug 16 '11 at 21:13
  • You are correct, I'm not using the redirect_handler. Instead, I created my own redirect handler. I will edit to remove. – pope Aug 23 '11 at 04:38
  • Why is it you do not need to instantiate the `MyHTTPRedirectHandler`, but rather pass the class into the `build_opener()` method? – neydroydrec Jan 09 '12 at 20:10
  • 1
    From the documentation: handlers can be either instances of BaseHandler, or subclasses of BaseHandler (in which case it must be possible to call the constructor without any parameters). Since MyHTTPRedirectHandler doesn't have a constructor with any arguments, I can pass it in as is. – pope Jan 12 '12 at 01:43
30

If all you need is stopping redirection, then there is a simple way to do it. For example I only want to get cookies and for a better performance I don't want to be redirected to any other page. Also I hope the code is kept as 3xx. let's use 302 for instance.

class MyHTTPErrorProcessor(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        code, msg, hdrs = response.code, response.msg, response.info()

        # only add this line to stop 302 redirection.
        if code == 302: return response

        if not (200 <= code < 300):
            response = self.parent.error(
                'http', request, response, code, msg, hdrs)
        return response

    https_response = http_response

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), MyHTTPErrorProcessor)

In this way, you don't even need to go into urllib2.HTTPRedirectHandler.http_error_302()

Yet more common case is that we simply want to stop redirection (as required):

class NoRedirection(urllib2.HTTPErrorProcessor):

    def http_response(self, request, response):
        return response

    https_response = http_response

And normally use it this way:

cj = cookielib.CookieJar()
opener = urllib2.build_opener(NoRedirection, urllib2.HTTPCookieProcessor(cj))
data = {}
response = opener.open('http://www.example.com', urllib.urlencode(data))
if response.code == 302:
    redirection_target = response.headers['Location']
Alan Duan
  • 420
  • 4
  • 9
  • 1
    Just what I needed, and very concise `class NoRedirection()` - you don't even have to store `code, msg, hdrs` -- Thanks Alan. – xtof pernod Sep 20 '13 at 15:07
  • You are right! And I removed the line as you suggested. Thanks Xtof. – Alan Duan Sep 24 '13 at 02:26
  • Is it possible to use this approach to get hold of the actual redirect URL? – AdjunctProfessorFalcon Jul 10 '15 at 05:33
  • 1
    @Malvin9000 If you want to get the target of the redirection, then yes, just read response.headers['Location'], you will get it:) – Alan Duan Jul 10 '15 at 06:10
  • 1
    @Malvin9000 Not literally using read, you can assign it to a new variable or directly print it out. Let me update the answer so you can see. – Alan Duan Jul 10 '15 at 06:16
  • @AlanDuan Thanks a lot for the edit update, much appreciated. When I print `redirection_target` I see the URL I'm inserting in `opener.open()` instead of the new URL that appears in my browser when I cut-and-paste the original URL. Not sure what I'm doing wrong... – AdjunctProfessorFalcon Jul 10 '15 at 06:27
  • @Malvin9000 most probably it redirects to itself. It happens when the url supports both GET and POST methods, when you POST some data not accepted, it directs back to itself using GET method. To get what exactly happen, you can use developer tools in Chrome or Firefox to trace every step, (call it out via CTRL+SHIFT+I in Chrome, then select Network tab). – Alan Duan Jul 10 '15 at 06:34
  • @AlanDuan This [post](http://stackoverflow.com/questions/31330968/capture-all-live-http-header-data/31331050?noredirect=1#comment50648982_31331050) is pretty much exactly what I'm trying to accomplish, same HTTP header data, etc, trying to get that value of `location` — but maybe it's not possible using raw requests. – AdjunctProfessorFalcon Jul 10 '15 at 06:44
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/82892/discussion-between-alan-duan-and-malvin9000). – Alan Duan Jul 10 '15 at 06:54
12

urllib2.urlopen calls build_opener() which uses this list of handler classes:

handlers = [ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor]

You could try calling urllib2.build_opener(handlers) yourself with a list that omits HTTPRedirectHandler, then call the open() method on the result to open your URL. If you really dislike redirects, you could even call urllib2.install_opener(opener) to your own non-redirecting opener.

It sounds like your real problem is that urllib2 isn't doing cookies the way you'd like. See also How to use Python to login to a webpage and retrieve cookies for later usage?

Community
  • 1
  • 1
joeforker
  • 40,459
  • 37
  • 151
  • 246
  • 7
    *You could try calling urllib2.build_opener(handlers) yourself with a list that omits HTTPRedirectHandler, then call the open() method on the result to open your URL.* Well, docs for urllib2.build_opener() say this *Instances of the following classes **will be in front of the handlers**, unless the handlers contain them, instances of them or subclasses of them: ProxyHandler, UnknownHandler, HTTPHandler, HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler, FileHandler, HTTPErrorProcessor.* It looks like ommiting `HTTPRedirectHandler` won't work... – Piotr Dobrogost Apr 01 '11 at 17:57
4

This question was asked before here.

EDIT: If you have to deal with quirky web applications you should probably try out mechanize. It's a great library that simulates a web browser. You can control redirecting, cookies, page refreshes... If the website doesn't rely [heavily] on JavaScript, you'll get along very nicely with mechanize.

Community
  • 1
  • 1
paprika
  • 2,424
  • 26
  • 46