Python HTTP HEAD - dealing with redirects properly?

Question

I can use urllib2 to make HEAD requests like so:

import urllib2
request = urllib2.Request('http://example.com')
request.get_method = lambda: 'HEAD'
urllib2.urlopen(request)

The problem is that it appears that when this follows redirects, it uses GET instead of HEAD.

The purpose of this HEAD request is to check the size and content type of the URL I'm about to download so that I can ensure that I don't download some huge document. (The URL is supplied by a random internet user through IRC).

How could I make it use HEAD requests when following redirects?

[Requests](http://docs.python-requests.org/en/latest/index.html) at least claims to do this the right way (at least, it documents its redirect behaviour as working for idempotent methods, and calls out HEAD specifically in the docs). — James Aylett, Apr 01 '12 at 19:41
a similar solution: http://stackoverflow.com/questions/9890815/python-get-headers-only-using-urllib2/9892207#9892207 — newtover, Apr 01 '12 at 21:00

score 21 · Answer 1 · answered Apr 01 '12 at 19:43

21

You can do this with the requests library:

>>> import requests
>>> r = requests.head('http://github.com', allow_redirects=True)
>>> r
<Response [200]>
>>> r.history
[<Response [301]>]
>>> r.url
u'https://github.com/'

answered Apr 01 '12 at 19:43

jterrace

64,866
22
157
202

score 3 · Accepted Answer · edited May 23 '17 at 12:24

Good question! If you're set on using urllib2, you'll want to look at this answer about the construction of your own redirect handler.

In short (read: blatantly stolen from the previous answer):

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar

Also, as mentioned in the errata, you can use Python Requests.

I ended up using this redirect handler, based on what you found: http://pastebin.com/m7aN21A7 Thanks! — Krenair, Apr 01 '12 at 20:59

Python HTTP HEAD - dealing with redirects properly?

2 Answers2

Linked