I don't think requests
has this built in—but you can do it manually pretty easily (as long as the server supports it).
The key is Range requests. To fetch part of a resource starting at byte 12345, you add this header:
Range: bytes=12345-
And then you can just append the results onto your file.
Ideally, you should verify that you get back a 206 Partial Content
instead of a 200
, and that the headers include the range you wanted:
Content-Range: bytes 12345-123456/123456
Content-Length: 111112
You also may want to pre-validate that the server handles ranges. You can do this by looking at the headers in your initial response, or by doing a HEAD
, which checks for this:
Accept-Ranges: bytes
If the header is missing entirely, or has none
as a value, or has a list of values that doesn't include bytes
, the server doesn't support resuming.
And also maybe check the Content-Length
to verify that you didn't already finish the whole file right before getting interrupted.
So, the code would look something like this:
def fetch_or_resume(url, filename):
with open(filename, 'ab') as f:
headers = {}
pos = f.tell()
if pos:
headers['Range'] = f'bytes={pos}-'
response = requests.get(url, headers=headers, stream=True)
if pos:
validate_as_you_want_(pos, response)
total_size = int(response.headers.get('content-length'))
for data in tqdm(iterable = response.iter_content(chunk_size = 1024), total = total_size//1024, unit = 'KB'):
f.write(data)
One common bug from people writing download manager type software is trying to keep track of how much has been read in previous requests. Don't do that's just use the file itself to tell you how much you have. After all, if you read 23456 bytes but only flushed 12345 to the file, that 12345 is where you want to start.