2

In Python 3x, how can I get the headers such as content length before downloading a file? This

import urllib.request

aaa, bbb = urllib.request.urlretrieve(url, file_name)
bbb['Content-Length']
# or

ccc, ddd = urllib.request.urlretrieve(url)
ddd['Content-Length']

seems like it's first downloading the whole file and then returns its headers. I think so because it takes plenty of time and returns the name either of the temp file (the 2nd case) or real one (the 1st case).

Or am I wrong?

What I want is to retrieve Content-Lenght first and then, depending on some condition, download and save or not do anything with it.

Vasiliy Faronov
  • 11,840
  • 2
  • 38
  • 49

1 Answers1

0

HTTP has a special method named HEAD that tells the server to send only the headers (including Content-Length, if any) and stop right before the sending the actual content.

But to tell urllib to use the HEAD method, you should switch to the newer urlopen function:

from urllib.request import Request, urlopen
resp = urlopen(Request('http://httpbin.org/html', method='HEAD'))
headers = resp.info()
print(headers['Content-Length'])

To be precise, the HTTP specification doesn’t require Content-Length in a response to a HEAD request (or at all), but most servers, especially those that serve files, will include it.

Community
  • 1
  • 1
Vasiliy Faronov
  • 11,840
  • 2
  • 38
  • 49