How do you send a HEAD HTTP request in Python 2?

Question

What I'm trying to do here is get the headers of a given URL so I can determine the MIME type. I want to be able to see if http://somedomain/foo/ will return an HTML document or a JPEG image for example. Thus, I need to figure out how to send a HEAD request so that I can read the MIME type without having to download the content. Does anyone know of an easy way of doing this?

score 109 · Answer 1 · edited Jun 27 '16 at 13:39

109

urllib2 can be used to perform a HEAD request. This is a little nicer than using httplib since urllib2 parses the URL for you instead of requiring you to split the URL into host name and path.

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

Headers are available via response.info() as before. Interestingly, you can find the URL that you were redirected to:

>>> print response.geturl()
http://www.google.com.au/index.html

edited Jun 27 '16 at 13:39

Anthony Geoghegan

11,533
5
49
56

answered Jan 15 '10 at 10:50

doshea

1,567
1
11
12

1

response.info().__str__() will return string format of the header, in case you want to do something with the result you get. – Shane Oct 12 '10 at 12:17
6

except that trying this with python 2.7.1 (ubuntu natty), if there's a redirect, it does a GET on the destination, not a HEAD... – eichin Aug 23 '11 at 04:37
1

That's the advantage of the `httplib.HTTPConnection`, which doesn't handle redirects automatically. – Ehtesh Choudhury Oct 04 '11 at 06:59
but with doshea's answer. how to set the timeout? How to handle bad URLs, i.e., URLs that are not no longer alive. – fanchyna Aug 19 '13 at 17:31

score 105 · Accepted Answer · edited Jun 27 '16 at 13:45

105

edit: This answer works, but nowadays you should just use the requests library as mentioned by other answers below.

Use httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

There's also a getheader(name) to get a specific header.

edited Jun 27 '16 at 13:45

Anthony Geoghegan

11,533
5
49
56

answered Sep 20 '08 at 06:45

Eevee

47,412
11
95
127

2

this response marked as answered but one should look at the *requests* lib. Look at the Dalius's response that is a bit below. – Bahadir Cambel Dec 22 '11 at 20:49
This is really nice, but it requires you to have separate values for the host and path of the request. It's useful to have `urlparse` at hand, which is shown by some lower-ranked reponse. – Tomasz Gandor Jan 10 '13 at 10:48
9

Note for Python 3; `httplib` is renamed to `http.client`. – Santosh Kumar Mar 13 '13 at 13:31
3

Unfortunately, `requests` isn't shipped with Python by default. – rook Nov 18 '13 at 12:43
@rook neither is your program :) – Eevee Nov 18 '13 at 19:35

K Z · Answer 3 · 2012-10-22T02:05:01.853

75

Obligatory Requests way:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

edited Oct 22 '12 at 02:05

answered Oct 21 '12 at 11:00

K Z

29,661
8
73
78

score 36 · Answer 4 · edited Jun 06 '13 at 21:07

36

I believe the Requests library should be mentioned as well.

edited Jun 06 '13 at 21:07

Brad Koch

19,267
19
110
137

answered Sep 12 '11 at 12:02

daliusd

1,025
11
15

5

This answer deserves more attention. Looks like a pretty good library that makes the problem trivial. – Nick Retallack Oct 27 '11 at 00:00
3

I agree It was very simple to make requests: {code} import requests r = requests.head('http://github.com') {code} – Luis R. Nov 17 '11 at 19:45
@LuisR.: if there is a redirect then it follows GET/POST/PUT/DELETE also. – jfs Feb 10 '12 at 13:40
@Nick Retallack: there is no easy way to disable redirects. `allow_redirects` can disable only POST/PUT/DELETE redirects. Example: [head request no redirect](http://hastebin.com/hokutehopu.py) – jfs Feb 10 '12 at 14:01
@J.F.Sebastian The link to your example seems to be broken. Could you elaborate on the issue with following redirects? – Piotr Dobrogost Aug 30 '12 at 18:13
@Piotr: The issue was that `requests.head(URL)` didn't stop on redirect and made additional GET requests. Current version 0.13.9 doesn't do it anymore (at least for 301, 302 redirects). – jfs Aug 30 '12 at 20:32
@J.F.Sebastian It seems it was fixed in revision [6f57352](https://github.com/kennethreitz/requests/commit/6f5735274b9ce2c61345adf8d7657b01b1623320) – Piotr Dobrogost Aug 31 '12 at 07:17
@Piotr: something else also had changed. As I said above `allow_redirects` worked by enabling redirect for POST i.e., `allow_redirects` had no effect on HEAD. – jfs Aug 31 '12 at 13:47
check this: http://stackoverflow.com/questions/2018026/should-i-use-urllib-or-urllib2-or-requests for a comparison of the different libs that could be used for this. Requests seems to be the most popular. – brita_ Apr 29 '14 at 14:08

score 17 · Answer 5 · edited Sep 20 '11 at 17:05

17

Just:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

Edit: I've just came to realize there is httplib2 :D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

link text

edited Sep 20 '11 at 17:05

ecstaticpeon

568
4
7

answered Dec 12 '10 at 12:45

Paweł Prażak

3,091
1
27
42

Slightly nasty in that you're leaving get_method as an unbound function rather than binding it to `request`. (Viz, it'll work but it's bad style and if you wanted to use `self` in it - tough.) – Chris Morgan Dec 12 '10 at 12:53
4

Could you elaborate a bit more about pros and cons of this solution? I'm not an Python expert as you can see, so I could benefit knowing when it can turn bad ;) As fas as I understand the concern is that it's a hack that may or may not work depending on implementation change? – Paweł Prażak Dec 12 '10 at 13:54
This second version in this code is the only one that worked for me for a URL with a 403 Forbidden. Others were throwing an exception. – duality_ Apr 11 '13 at 15:16

score 12 · Answer 6 · answered Mar 14 '13 at 21:38

For completeness to have a Python3 answer equivalent to the accepted answer using httplib.

It is basically the same code just that the library isn't called httplib anymore but http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

score 2 · Answer 7 · edited Feb 15 '16 at 21:56

2

import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

edited Feb 15 '16 at 21:56

jcomeau_ictx

37,688
6
92
107

answered Feb 10 '12 at 12:39

Pranay Agarwal

93
5

What are the dollar-signs before `import`? +1 for the `urlparse` - together with `httplib` they give the comfort of `urllib2`, when dealing with URLs on the input side. – Tomasz Gandor Jan 10 '13 at 10:47

score 1 · Answer 8 · answered Apr 13 '10 at 15:10

1

I have found that httplib is slightly faster than urllib2. I timed two programs - one using httplib and the other using urllib2 - sending HEAD requests to 10,000 URL's. The httplib one was faster by several minutes. httplib's total stats were: real 6m21.334s user 0m2.124s sys 0m16.372s

And urllib2's total stats were: real 9m1.380s user 0m16.666s sys 0m28.565s

Does anybody else have input on this?

answered Apr 13 '10 at 15:10

IgorGanapolsky

26,189
23
116
147

Input? The problem is IO-bound and you're using blocking libraries. Switch to eventlet or twisted if you want better performance. The limitations of urllib2 you mention are CPU-bound. – Devin Jeanpierre Aug 13 '10 at 01:04
3

urllib2 follows redirects, so if some of your URLs redirect, that will probably be the reason for the difference. And, httplib is more low-level, urllib2 does parse the url for example. – Marian Aug 25 '10 at 22:05
1

urllib2 is just a thin layer of abstraction on top of httplib, I'd be very surprised if you were cpu bound unless the urls are on a very fast LAN. Is it possible some of the urls were redirects? urllib2 will follow the redirects whereas httplib would not. The other possibility is that the network conditions ( anything you don't have explicit control of in this experiment ) fluctuated between the 2 runs. you should do at least 3 interleaved runs of each to reduce this likelyhood – John La Rooy Feb 20 '11 at 20:30

score 1 · Answer 9 · answered Apr 23 '09 at 01:39

As an aside, when using the httplib (at least on 2.5.2), trying to read the response of a HEAD request will block (on readline) and subsequently fail. If you do not issue read on the response, you are unable to send another request on the connection, you will need to open a new one. Or accept a long delay between requests.

score 0 · Answer 10 · answered Jun 06 '13 at 10:55

And yet another approach (similar to Pawel answer):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

Just to avoid having unbounded methods at instance level.

score -4 · Answer 11 · answered Dec 11 '08 at 00:11

-4

Probably easier: use urllib or urllib2.

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info() is a dictionary-like object, so you can do f.info()['content-type'], etc.

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

The docs note that httplib is not normally used directly.

answered Dec 11 '08 at 00:11

14

However, urllib will do a GET and the question is about performing a HEAD. Maybe the poster does not want to retrieve an expensive document. – Philippe F May 06 '09 at 08:30

How do you send a HEAD HTTP request in Python 2?

11 Answers11

Linked

Related