10

I can use urllib2 to make HEAD requests like so:

import urllib2
request = urllib2.Request('http://example.com')
request.get_method = lambda: 'HEAD'
urllib2.urlopen(request)

The problem is that it appears that when this follows redirects, it uses GET instead of HEAD.

The purpose of this HEAD request is to check the size and content type of the URL I'm about to download so that I can ensure that I don't download some huge document. (The URL is supplied by a random internet user through IRC).

How could I make it use HEAD requests when following redirects?

Krenair
  • 570
  • 5
  • 21
  • 3
    [Requests](http://docs.python-requests.org/en/latest/index.html) at least claims to do this the right way (at least, it documents its redirect behaviour as working for idempotent methods, and calls out HEAD specifically in the docs). – James Aylett Apr 01 '12 at 19:41
  • a similar solution: http://stackoverflow.com/questions/9890815/python-get-headers-only-using-urllib2/9892207#9892207 – newtover Apr 01 '12 at 21:00

2 Answers2

21

You can do this with the requests library:

>>> import requests
>>> r = requests.head('http://github.com', allow_redirects=True)
>>> r
<Response [200]>
>>> r.history
[<Response [301]>]
>>> r.url
u'https://github.com/'
jterrace
  • 64,866
  • 22
  • 157
  • 202
3

Good question! If you're set on using urllib2, you'll want to look at this answer about the construction of your own redirect handler.

In short (read: blatantly stolen from the previous answer):

import urllib2

#redirect_handler = urllib2.HTTPRedirectHandler()

class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        print "Cookie Manip Right Here"
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookieprocessor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor)
urllib2.install_opener(opener)

response =urllib2.urlopen("WHEREEVER")
print response.read()

print cookieprocessor.cookiejar

Also, as mentioned in the errata, you can use Python Requests.

Community
  • 1
  • 1
MrGomez
  • 23,788
  • 45
  • 72
  • 1
    I ended up using this redirect handler, based on what you found: http://pastebin.com/m7aN21A7 Thanks! – Krenair Apr 01 '12 at 20:59