I use Python 2.78 and the requests
library to download a file from a HTTP server. Before downloading, I want to check the file size and do something different (e.g. aborting) when the size exceeds some given limit. I know that this can easily be checked if the server provides the attribute content-length
in the header - however, the one I use doesn't.
According to this great article on exception handling with requests, checking the file size before saving to harddisk can be done by only downloading the header and then iterating over the content without actually saving the file. This approach is used in my code below.
However, I got the impression that I can only iterate over the content once (to check the file size) and then the connection gets closed. There is nothing like seek(0)
or similar to reset the parser to the beginning, iterate again but this time save the file to disk. When I try this (as in my code below), I get a file of 0 kb size on my harddisk.
import requests
from contextlib import closing
# Create a custom exception.
class ResponseTooBigException(requests.RequestException):
"""The response is too big."""
# Maximum file size and download chunk size.
TOO_BIG = 1024 * 1024 * 200 # 200MB
CHUNK_SIZE = 1024 * 128
# Connect to a test server. stream=True ensures that only the header is downloaded here.
response = requests.get('http://leil.de/di/files/more/testdaten/25mb.test', stream=True)
try:
# Iterate over the response's content without actually saving it on harddisk.
with closing(response) as r:
content_length = 0
for chunk in r.iter_content(chunk_size=CHUNK_SIZE):
content_length = content_length + CHUNK_SIZE
# Do not download the file if it is too big.
if content_length > TOO_BIG:
raise ResponseTooBigException(response=response)
else:
# If the file is not too big, this code should download the response file to harddisk. However, the result is a 0kb file.
print('File size ok. Downloading...')
with open('downloadedFile.test', 'wb') as f:
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
if chunk:
f.write(chunk)
f.flush()
except ResponseTooBigException as e:
print('The HTTP response was too big (> 200MB).')
I already tried to make a copy of the response first with
import copy
response_copy = copy.copy(response)
and then use response_copy
in line
with closing(response_copy) as r:
but response
in line
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
to allow for too independent iterations over the response. However, this results in
AttributeError Traceback (most recent call last)
<ipython-input-2-3f918ff844c3> in <module>()
35 if chunk:
36 f.write(chunk)
---> 37 f.flush()
38
39 except ResponseTooBigException as e:
C:\Python34\lib\contextlib.py in __exit__(self, *exc_info)
150 return self.thing
151 def __exit__(self, *exc_info):
--> 152 self.thing.close()
153
154 class redirect_stdout:
C:\Python34\lib\site-packages\requests\models.py in close(self)
837 *Note: Should not normally need to be called explicitly.*
838 """
--> 839 return self.raw.release_conn()
AttributeError: 'NoneType' object has no attribute 'release_conn'