Retrieving the headers of a file (resource) using urllib in Python 3

Question

In Python 3x, how can I get the headers such as content length before downloading a file? This

import urllib.request

aaa, bbb = urllib.request.urlretrieve(url, file_name)
bbb['Content-Length']
# or

ccc, ddd = urllib.request.urlretrieve(url)
ddd['Content-Length']

seems like it's first downloading the whole file and then returns its headers. I think so because it takes plenty of time and returns the name either of the temp file (the 2nd case) or real one (the 1st case).

Or am I wrong?

What I want is to retrieve Content-Lenght first and then, depending on some condition, download and save or not do anything with it.

score 0 · Accepted Answer · edited Oct 07 '21 at 06:11

0

HTTP has a special method named HEAD that tells the server to send only the headers (including Content-Length, if any) and stop right before the sending the actual content.

But to tell urllib to use the HEAD method, you should switch to the newer urlopen function:

from urllib.request import Request, urlopen
resp = urlopen(Request('http://httpbin.org/html', method='HEAD'))
headers = resp.info()
print(headers['Content-Length'])

To be precise, the HTTP specification doesn’t require Content-Length in a response to a HEAD request (or at all), but most servers, especially those that serve files, will include it.

edited Oct 07 '21 at 06:11

Community

1
1

answered Oct 23 '15 at 19:37

Vasiliy Faronov

11,840
2
38
49

will I have to do a request to the server twice then? – Oct 23 '15 at 19:38
@jawanam Yes (if you decide to go ahead and download the file). – Vasiliy Faronov Oct 23 '15 at 19:39

Retrieving the headers of a file (resource) using urllib in Python 3

1 Answers1