2

I have problem, that my file is not being downloaded fully. It always downloads just few kb from entire file which is not what I want. Do you see problem somewhere or should I add some more logic in there?

Thank you

import os
import requests


def download(url: str, dest_folder: str):
    if not os.path.exists(dest_folder):
        os.makedirs(dest_folder)  # create folder if it does not exist

    filename = url.split('/')[-1].replace(" ", "_")  # be careful with file names
    file_path = os.path.join(dest_folder, filename)

    r = requests.get(url, stream=True)
    if r.ok:
        print("SUCCESS: Saving to", os.path.abspath(file_path))
        with open(file_path, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024 * 8):
                if chunk:
                    f.write(chunk)
                    f.flush()
                    os.fsync(f.fileno())
    else:  # HTTP status code 4XX/5XX
        print("Download failed: status code {}\n{}".format(r.status_code, r.text))

## EXECUTE ##
download("SOME FILE URL",
         dest_folder="SOME FOLDER)```

  • What's the full size of the file you want to download? – jizhihaoSAMA Mar 25 '20 at 12:54
  • 40MB @jizhihaoSAMA – Vojtech Litavsky Mar 25 '20 at 12:55
  • 1
    Does this answer your question? [Download large file in python with requests](https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests) – The Pjot Mar 25 '20 at 12:55
  • @VojtechLitavsky so, you've set `chunk_size` to a smaller value? – AivanF. Mar 25 '20 at 13:00
  • @AivanF. Yes, I did but didn't help I am still fighting with it – Vojtech Litavsky Mar 26 '20 at 21:24
  • @VojtechLitavsky your previous comment seemed that the linked helped you solve the problem! You should explain more clear. About the problem, do you know what part of the file get saved? Also, try adding: `parts=[]` before the loop, `parts.append(len(chunk))` in the loop, and `print(f'{len(parts)} parts of {sum(parts)} bytes in total')` after all, and compare the result with actual file size. – AivanF. Mar 27 '20 at 08:15

1 Answers1

0

First. Could you try shutil.copyfileobj to write the file? It should be better than f.write.

with open(file_path, 'wb') as f:
    shutil.copyfileobj(r.raw, f, length=16*1024*1024)
    f.flush()
    os.fsync(f.fileno())

Second. You can get the total file size. If your file size not match, you can re-download your file agagin.

total_content_size = int(requests.get(url, stream=True).headers['Content-Length'])

Third. This is works for me to resolve the problem. You should download the file resuming at break-points. This function will download the file from the last break-points. You can add a while loop to do this until the file size is full match.

def download_file(self, url, file_path):
    file_name = url.rsplit('/', 1)[-1]
    file_full_location = file_path + "/" + file_name

    total_content_size = int(requests.get(url, stream=True).headers['Content-Length'])
    if os.path.exists(file_full_location):
        temp_size = os.path.getsize(file_full_location)
        if total_content_size == temp_size:
            return
    else:
        temp_size = 0

    headers = {'Range': 'bytes=%d-' % temp_size}
    with requests.get(url, stream = True, headers=headers) as response:
        response.raise_for_status()
        with open(file_full_location, 'ab') as f:
            shutil.copyfileobj(response.raw, f, length=16*1024*1024)
W. Dan
  • 906
  • 8
  • 10