Getting HEAD content with Python Requests

Question

I'm trying to parse the result of a HEAD request done using the Python Requests library, but can't seem to access the response content.

According to the docs, I should be able to access the content from requests.Response.text. This works fine for me on GET requests, but returns None on HEAD requests.

GET request (works)

import requests
response = requests.get(url)
content = response.text

content = <html>...</html>

HEAD request (no content)

import requests
response = requests.head(url)
content = response.text

content = None

EDIT

OK I've quickly realized form the answers that the HEAD request is not supposed to return content- only headers. But does that mean that, to access things found IN the <head> tag of a page, like <link> and <meta> tags, that one must GET the whole document?

phihag · Accepted Answer · 2012-03-04T13:22:34.787

34

By definition, the responses to HEAD requests do not contain a message-body.

Send a GET request if you want to, well, get a response body. Send a HEAD request iff you are only interested in the response status code and headers.

HTTP transfers arbitrary content; the HTTP term header is completely unrelated to an HTML <head>. However, HTTP can be advised to download only a part of the document. If you know the length of the HTML <head> code (or an upper boundary therefor), you can include an HTTP Range header in your request that advises the remote server to only return a certain number of bytes. If the remote server supports HTTP ranges, it will then serve the reduced answer.

edited Mar 04 '12 at 13:22

answered Mar 04 '12 at 12:48

phihag

278,196
72
453
469

OK my mistake- but then how does one capture things like `` and `meta` tags from a HEAD request- or is that not possible? – Yarin Mar 04 '12 at 12:51
1

Umm, `` and `` tags are only present in the HTML **body**. The only headers you can access are the HTTP ones. *Why* do you want to send a HEAD instead of a GET anyways? – phihag Mar 04 '12 at 12:53
phihag- ? `` tags are within the `` section of a doc- view source on this page. I was hoping to get only the `` to reduce time on link scraping. – Yarin Mar 04 '12 at 12:57
4

You're confusing similar terms in the context of different protocols. HTTP does not know anything about HTML code; it just transfers arbitrary content with headers (for example for the content type or its expiration date). If you know the length of the HTML ``, you can include the [Range](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35) header in your request, but I'll doubt that will speed up things unless the full HTML code is really huge. – phihag Mar 04 '12 at 13:23

score 10 · Answer 2 · answered Mar 04 '12 at 12:48

10

A HEAD doesn't have any content! Try response.headers - that's probably where the action is. An HTTP HEAD request doesn't get the <head> element of the HTML response you would get from a GET request. I think that's your mistake.

answered Mar 04 '12 at 12:48

Spacedman

92,590
12
140
224

score 3 · Answer 3 · answered Mar 04 '12 at 12:49

3

HEAD responses have no body. They only return HTTP headers, the same you would get using a GET request.

answered Mar 04 '12 at 12:49

dorsh

23,750
2
27
29

Getting HEAD content with Python Requests

3 Answers3

Linked

Related