Download large file in python with requests

Question

Requests is a really nice library. I'd like to use it for downloading big files (>1GB). The problem is it's not possible to keep whole file in memory; I need to read it in chunks. And this is a problem with the following code:

import requests

def DownloadFile(url)
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    f = open(local_filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    f.close()
    return

For some reason it doesn't work this way; it still loads the response into memory before it is saved to a file.

The requests library is nice, but not intended for this purpose. I would suggest using a different library such as urlib3. https://stackoverflow.com/questions/17285464/whats-the-best-way-to-download-file-using-urllib3 — user8550137, May 03 '23 at 16:04

score 957 · Accepted Answer · edited Jan 14 '21 at 19:02

957

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See body-content-workflow and Response.iter_content for further reference.

edited Jan 14 '21 at 19:02

Jenia

374
1
4
15

answered May 22 '13 at 15:52

Roman Podlinov

23,806
7
41
60

@Shuman This code successfully downloads files which are bigger then 1.5Gb. Can you download the file via any browser successfully? – Roman Podlinov May 14 '14 at 11:55
yes in firefox if i download manually, it successfully saves out a 1.5GB .zip file – Shuman May 14 '14 at 13:58
9

@Shuman As I see you resolved the issue when switched from http:// to https:// (https://github.com/kennethreitz/requests/issues/2043). Can you please update or delete your comments because people may think that there are issues with the code for files bigger 1024Mb – Roman Podlinov May 14 '14 at 18:15
18

the `chunk_size` is crucial. by default it's 1 (1 byte). that means that for 1MB it'll make 1 milion iterations. http://docs.python-requests.org/en/latest/api/#requests.Response.iter_content – Eduard Gamonal Mar 25 '15 at 13:06
1

Is it possible to parallelize the iter_content() part somehow to speed up the download? Thanks! – Rovin Bhandari May 06 '15 at 11:46
1

@RomanPodlinov, do you mind telling me why are you using the flush? – Fernando Freitas Alves May 15 '15 at 19:52
you could use url.rsplit('/', 1)[1] as well, which will not split the whole url but only the last part of it. – reox May 27 '15 at 11:27
@RovinBhandari: to parallelize, [find out whether there is support for bytes range http header in `requests`](http://stackoverflow.com/q/13973188/4279) – jfs Sep 28 '15 at 01:34
`url.split('/')[-1]` might be too simplistic e.g., see [`url2filename()`](https://gist.github.com/zed/c2168b9c52b032b5fb7d) – jfs Sep 28 '15 at 01:35
4

`f.flush()` seems unnecessary. What are you trying to accomplish using it? (your memory usage won't be 1.5gb if you drop it). `f.write(b'')` (if `iter_content()` may return an empty string) should be harmless and therefore `if chunk` could be dropped too. – jfs Sep 28 '15 at 01:40
@J.F.Sebastian agree url2filename is better. About flash. The idea is flush data into physical file on drive. If you see that the code works good without flash() just remove it. – Roman Podlinov Sep 28 '15 at 18:31
13

@RomanPodlinov: `f.flush()` doesn't flush data to physical disk. It transfers the data to OS. Usually, it is enough unless there is a power failure. `f.flush()` makes the code slower here for no reason. The flush happens when the correponding file buffer (inside app) is full. If you need more frequent writes; pass buf.size parameter to `open()`. – jfs Sep 28 '15 at 19:08
@J.F.Sebastian Thank you I commented the flush row in the code – Roman Podlinov Oct 06 '15 at 14:12
6

``if chunk: # filter out keep-alive new chunks`` – it is redundant, isn't it? Since ``iter_content()`` always yields string and never yields ``None``, it looks like premature optimization. I also doubt it can ever yield empty string (I cannot imagine any reason for this). – y0prst Feb 27 '16 at 05:35
In the case you use dropbox links, it will save your file with a name like "Banner_apus_1.23.zip?dl=1" – Павел Иванов May 17 '16 at 08:16
@paus Double check what you provide as a link. If Dropbox adds something into url (or redirect to other url) you can easy remove it. Just change how you set local_filename variable. – Roman Podlinov May 17 '16 at 11:42
@y0prst Plz put your attention on the comment "filter out keep-alive new chunks" for this line. If you download file with size in several GBs it makes total sense. – Roman Podlinov May 17 '16 at 11:49
3

@RomanPodlinov I'm not familiar with the term "keep-alive new chunks". Can you explain it a bit further? There are keep-alive (persistent) connections (when several HTTP requests are contained in a single TCP connection) and chunked responses (when there is not Content-Length header and content is divided into chunks, the last one is zero-length). AFAIK, these two features are independent, they have nothing in common. – y0prst May 21 '16 at 06:39
2

@RomanPodlinov Another point: iter_content() always yields string. There is nothing wrong with writing an empty string to file, right. So, why we should check the length? – y0prst May 21 '16 at 06:51
6

@RomanPodlinov And one more point, sorry :) After reading iter_content() sources I've concluded that it cannot ever yield an empty string: there are emptiness checks everywhere. The main logic here: [requests/packages/urllib3/response.py](https://github.com/kennethreitz/requests/blob/master/requests/packages/urllib3/response.py#L332). – y0prst May 21 '16 at 06:59
But why not `shutil.copyfileobj`? – stek29 Aug 17 '16 at 09:32
@stek29 coz response and response.iter_content is not file-like object? – Reishin Sep 20 '16 at 10:37
1

@stek29 Example with `shutil.copyfileobj` is below by using Response.raw – Reishin Sep 20 '16 at 11:03
@y0prst "I'm not familiar with the term "keep-alive new chunks"." On the one hand I don't know who added this comment into thew code, on the other hand you change my words. This line of code removes empty chunks which appears from time to time probably because of keep-alive requests during download – Roman Podlinov Oct 01 '16 at 15:14
@RomanPodlinov in regards to the "keep-alive chunks" check that you and y0prst were discussing; was the conclusion that it is unnecessary because requests never returns an empty string thanks to internal checks? – jkarimi May 27 '17 at 00:13
@RomanPodlinov this line seems to suggest so at least in the case of 'file-like objects': https://github.com/kennethreitz/requests/blob/84099eea9f99168b73ff48270dcab24e5a1ee959/requests/models.py#L750 – jkarimi May 27 '17 at 00:22
@for a 5GB file above code is taking forever, what would be the ideal chunk size to use in this case?anything we can do to improve the speed of downloading – user3508811 Oct 04 '18 at 00:35
@user3508811 In your case I recommend to use my small lib https://github.com/keepitsimple/pyFTPclient it can reconnect and use multiple simultaneous connections for download. I used this small lib for downloading files of size 1-10 GBs – Roman Podlinov Oct 18 '18 at 10:43
@RomanPodlinov - I couldn't;t adapt pyFTPclien to download from a link, lets say `https://hostname.company.com/ui/containers/9888577` ,how would the following lines change to download from a link `obj = PyFTPclient('192.168.0.59', 2121, 'test', 'testftp') obj.DownloadFile('USAHD-8974-20131013-0300-0330.ts')` – user3508811 Oct 28 '18 at 18:00
@user3508811 pyFTPclient was implemented for FTP protocol. – Roman Podlinov Nov 09 '18 at 18:07
And remember flush after the writing to a file with `stream=True` if you're trying to get hash / size of the files right after the download - you may be missing a few (hundreds) bytes if you don't . – synweap15 Dec 12 '18 at 08:05
How do you know this is not occupying lots of memory? Looking at the process monitor? When I run: import sys print(sys.getsizeof(r.text)) I get the same size outputted whether I use your stream code above or not – newbie Dec 25 '18 at 21:42
@newbieI don't know what OS do you use. I use `htop` under Linux or `Process Monitor` from SysInternals.com under Windows – Roman Podlinov Jan 03 '19 at 16:59
1

to @0xcaff "Don't forget to close the connection with r.close()" - No. it's wrong. `with` will close connection automatically – Roman Podlinov May 06 '20 at 09:33
1

@RomanPodlinov It was not using `with` when I made this comment. – 0xcaff May 06 '20 at 19:11
I can't download a zip file of 221MB. The downloaded file size maxes out at 219KB every time i tried this code. – Naveen Reddy Marthala Jun 21 '20 at 04:44
what if i use `f=requests.get(url, stream=True)` then `for chunk in f.iter_content(chunk_size=8192)` without using `with`, would it work? – TomSawyer Sep 10 '20 at 07:34
@RomanPodlinov is it right to write directly to disk?. That is, 8192 or 512*1024 bytes are not so bigger then for a 100mb file this will do too much "write" operations.. could be this an issue? how can I handle it? – Pavel Angel Mendoza Villafane Dec 28 '20 at 17:32
how can i write the same for post method – Jis Mathew Feb 25 '21 at 18:36
i would suggest `os.path.basename(url)` to get filename – hellojoshhhy Mar 18 '22 at 02:51
so `if 'transfer-encoding' in r.headers.keys(): if 'chunked' in r.headers['transer-encoding']: chunk_size = none` – deadcow Jan 25 '23 at 19:24

score 524 · Answer 2 · edited Oct 28 '22 at 11:31

524

It's much easier if you use Response.raw and shutil.copyfileobj():

import requests
import shutil

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

This streams the file to disk without using excessive memory, and the code is simple.

Note: According to the documentation, Response.raw will not decode gzip and deflate transfer-encodings, so you will need to do this manually.

edited Oct 28 '22 at 11:31

Shiva

2,627
21
33

answered Aug 30 '16 at 02:13

John Zwinck

239,568
38
324
436

20

Note that you may need to adjust when [streaming gzipped responses](https://github.com/kennethreitz/requests/issues/2155) per issue 2155. – ChrisP Sep 29 '16 at 01:15
1

Have you tested this code for big files download >1gb? – Roman Podlinov Oct 01 '16 at 15:17
1

Yes I did. most of files were > 1GB. the code was downloading a bunch of video files on daily basis – Roman Podlinov Dec 28 '16 at 20:09
78

THIS should be the correct answer! The [accepted](https://stackoverflow.com/a/16696317/2662454) answer gets you up to 2-3MB/s. Using copyfileobj gets you to ~40MB/s. Curl downloads (same machines, same url, etc) with ~50-55 MB/s. – visoft Jul 12 '17 at 07:05
@visoft how did you check the download speeds? – Moondra Sep 20 '17 at 15:24
@Moondra From python, dividing the download time vs the file size. I used fairily larg files (100M-2G) over a gigabit connection. The server was more or less in the same network/datacenter. – visoft Sep 24 '17 at 08:33
6

A small caveat for using `.raw` is that it does not handle decoding. Mentioned in the docs here: http://docs.python-requests.org/en/master/user/quickstart/#raw-response-content – Eric Cousineau Dec 17 '17 at 01:03
Is it possible to stream to stdout through the `print`? – Vitaly Zdanevich Apr 10 '18 at 15:52
2

@VitalyZdanevich: Try `shutil.copyfileobj(r.raw, sys.stdout)`. – John Zwinck Apr 11 '18 at 02:59
5

@visoft I was able to match the download speeds between `raw` and `iter_content` after I increased `chunk_size` from `1024` to `10*1024` (debian ISO, regular connection) – Jan Benes Aug 10 '18 at 13:17
4

The issue with the accepted answer is the chunk size. If you have a sufficiently fast connection, 1KiB is too small, you spend too much time on overhead compared to transferring data. `shutil.copyfileobj` defaults to 16KiB chunks. Increasing the chunk size from 1KiB will almost certainly increase download rate, but don't increase too much. I am using 1MiB chunks and it works well, it approaches full bandwidth usage. You could try to monitor connection rate and adjust chunk size based on it, but beware premature optimization. – theferrit32 Jan 15 '19 at 19:15
13

@EricCousineau You can patch up this behaviour [replacing the `read` method: `response.raw.read = functools.partial(response.raw.read, decode_content=True)`](https://github.com/requests/requests/issues/2155#issuecomment-50771010) – Nuno André Jan 27 '19 at 12:39
Is there any way to limit the streaming read here to a max value, say 128 KiB? – Asclepius Feb 11 '19 at 23:00
2

Meanwhile it's 2019. I took the freedom to edit the missing @with requests.get(url, stream=True) as r:@ into the answer. There's no reason not to use it. – vog Jun 06 '19 at 13:15
1

@vog, the source code (at least, in the latest requests) already includes the with statement `with sessions.Session() as session: return session.request(method=method, url=url, **kwargs)` – Max Jun 10 '19 at 21:39
5

Adding length param got me better download speeds `shutil.copyfileobj(r.raw, f, length=16*1024*1024)` – citynorman Feb 07 '20 at 22:27
For me, I got back an appropriate sized object, but my machine told me the file was corrupt. I am working with pdf files and no application can open what I just downloaded. – demongolem Apr 14 '20 at 15:55
just gonna bump this because this was so FAST and simple to download multiple 1GB+ files compared to others – Matt Jun 22 '20 at 16:40
for me this results in an invalid tarball: `gzip: stdin: not in gzip format` but if I download it via browser the tar format is gzip. – KIC Feb 12 '21 at 10:46
3

Updated link to github [issue 2155](https://github.com/psf/requests/issues/2155) about streaming gzipped responses (the link in ChrisP's answer no longer works). – Christian Long May 07 '21 at 22:36
1

it seems to me that `shutil.copyfileobj` is returning before the download is complete. Is there a way of blocking until the file has completely downloaded? – craq Oct 12 '21 at 00:30
@craq Are you perhaps seeing some delay in your filesystem? `shutil.copyfileobj` doesn't exactly return before completion, but your filesystem may have some delay before readers observe the file being completely written. – John Zwinck Oct 15 '21 at 01:49
@JohnZwinck yes that could be it. I couldn't figure out an elegant way to check that the full file had been written, but I haven't seen any issues since I added a simple sleep. – craq Oct 15 '21 at 05:03
1

I like this approach better, but how would I implement tqdm with this one? – Source Matters Apr 21 '22 at 04:41
@SourceMatters: if a progress bar is important to you, this solution won't be the most straightforward. – John Zwinck Apr 23 '22 at 05:57

score 112 · Answer 3 · edited Jun 01 '22 at 00:30

Not exactly what OP was asking, but... it's ridiculously easy to do that with urllib:

from urllib.request import urlretrieve

url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
dst = 'ubuntu-16.04.2-desktop-amd64.iso'
urlretrieve(url, dst)

Or this way, if you want to save it to a temporary file:

from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile

url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
    copyfileobj(fsrc, fdst)

I watched the process:

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?

This function "might become deprecated at some point in the future." cf. https://docs.python.org/3/library/urllib.request.html#legacy-interface — Wok, Apr 08 '22 at 11:28

score 46 · Answer 4 · edited May 23 '17 at 10:31

46

Your chunk size could be too large, have you tried dropping that - maybe 1024 bytes at a time? (also, you could use with to tidy up the syntax)

def DownloadFile(url):
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return

Incidentally, how are you deducing that the response has been loaded into memory?

It sounds as if python isn't flushing the data to file, from other SO questions you could try f.flush() and os.fsync() to force the file write and free memory;

    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                f.flush()
                os.fsync(f.fileno())

edited May 23 '17 at 10:31

Community

1
1

answered May 22 '13 at 15:02

danodonovan

19,636
10
70
78

1

I use System Monitor in Kubuntu. It shows me that python process memory increases (up to 1.5gb from 25kb). – Roman Podlinov May 22 '13 at 15:22
That memory bloat sucks, maybe `f.flush(); os.fsync()` might force a write an memory free. – danodonovan May 22 '13 at 15:39
2

it's `os.fsync(f.fileno())` – sebdelsol Oct 10 '14 at 23:40
38

You need to use stream=True in the requests.get() call. That's what's causing the memory bloat. – Hut8 May 10 '15 at 21:59
1

minor typo: you miss a colon (':') after `def DownloadFile(url)` – Aubrey Jan 04 '17 at 15:43
What if i don't want to save it as a file but in BytesIO ??? – shzyincu Jun 28 '21 at 14:26

score 13 · Answer 5 · answered Oct 19 '20 at 04:09

13

use wget module of python instead. Here is a snippet

import wget
wget.download(url)

answered Oct 19 '20 at 04:09

8

This is a very old an unmaitained module. – leo Dec 07 '22 at 12:37
The OP is specifically asking how to do it in python with requests. Jumping out of python space is not usually an option. – shacker Jun 07 '23 at 00:20

score 9 · Answer 6 · answered Jul 05 '20 at 17:15

Based on the Roman's most upvoted comment above, here is my implementation, Including "download as" and "retries" mechanism:

def download(url: str, file_path='', attempts=2):
    """Downloads a URL content into a file (with large file support by streaming)

    :param url: URL to download
    :param file_path: Local file name to contain the data downloaded
    :param attempts: Number of attempts
    :return: New file path. Empty string if the download failed
    """
    if not file_path:
        file_path = os.path.realpath(os.path.basename(url))
    logger.info(f'Downloading {url} content to {file_path}')
    url_sections = urlparse(url)
    if not url_sections.scheme:
        logger.debug('The given url is missing a scheme. Adding http scheme')
        url = f'http://{url}'
        logger.debug(f'New url: {url}')
    for attempt in range(1, attempts+1):
        try:
            if attempt > 1:
                time.sleep(10)  # 10 seconds wait time between downloads
            with requests.get(url, stream=True) as response:
                response.raise_for_status()
                with open(file_path, 'wb') as out_file:
                    for chunk in response.iter_content(chunk_size=1024*1024):  # 1MB chunks
                        out_file.write(chunk)
                logger.info('Download finished successfully')
                return file_path
        except Exception as ex:
            logger.error(f'Attempt #{attempt} failed with error: {ex}')
    return ''

score 3 · Answer 7 · answered Aug 01 '22 at 13:53

Here is additional approach for the use-case of async chunked download, without reading all the file content to memory.
It means that both read from the URL and the write to file are implemented with asyncio libraries (aiohttp to read from the URL and aiofiles to write the file).

The following code should work on Python 3.7 and later.
Just edit SRC_URL and DEST_FILE variables before copy and paste.

import aiofiles
import aiohttp
import asyncio

async def async_http_download(src_url, dest_file, chunk_size=65536):
    async with aiofiles.open(dest_file, 'wb') as fd:
        async with aiohttp.ClientSession() as session:
            async with session.get(src_url) as resp:
                async for chunk in resp.content.iter_chunked(chunk_size):
                    await fd.write(chunk)

SRC_URL = "/path/to/url"
DEST_FILE = "/path/to/file/on/local/machine"

asyncio.run(async_http_download(SRC_URL, DEST_FILE))

score 2 · Answer 8 · answered Oct 02 '21 at 19:19

`requests` is good, but how about `socket` solution?

def stream_(host):
    import socket
    import ssl
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
        context = ssl.create_default_context(Purpose.CLIENT_AUTH)
        with context.wrap_socket(sock, server_hostname=host) as wrapped_socket:
            wrapped_socket.connect((socket.gethostbyname(host), 443))
            wrapped_socket.send(
                "GET / HTTP/1.1\r\nHost:thiscatdoesnotexist.com\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\n\r\n".encode())

            resp = b""
            while resp[-4:-1] != b"\r\n\r":
                resp += wrapped_socket.recv(1)
            else:
                resp = resp.decode()
                content_length = int("".join([tag.split(" ")[1] for tag in resp.split("\r\n") if "content-length" in tag.lower()]))
                image = b""
                while content_length > 0:
                    data = wrapped_socket.recv(2048)
                    if not data:
                        print("EOF")
                        break
                    image += data
                    content_length -= len(data)
                with open("image.jpeg", "wb") as file:
                    file.write(image)

I'm curious what's the advantange of using this instead of a higher level (and well tested) method from libs like requests? — tuxillo, Apr 21 '22 at 22:18
Libs like requests are full of abstraction above the native sockets. That's not the best algorithm, but it could be faster because of no abstraction at all. — r1v3n, May 07 '22 at 21:00
It appears this loads the whole content into memory in the "image" variable, then writes it to a file. How does this work for large files with local memory constraints? — rayzinnz, May 10 '23 at 17:10
Yeah, you can just modify this if you want. For example: change the last part with `image` variable and write to file itself instead of variable — r1v3n, Jun 01 '23 at 08:38

rayzinnz · Answer 9 · 2023-05-10T19:22:47.670

Yet another option for downloading large files. This will allow you to stop and continue later (press the Enter key to stop), and continue from where you left off if your connection gets dropped otherwise.

import datetime
import os
import requests
import threading as th

keep_going = True
def key_capture_thread():
    global keep_going
    input()
    keep_going = False
pkey_capture = th.Thread(target=key_capture_thread, args=(), name='key_capture_process', daemon=True).start()

def download_file(url, local_filepath):
    #assumptions:
    #  headers contain Content-Length:
    #  headers contain Accept-Ranges: bytes
    #  stream is not encoded (otherwise start bytes are not known, unless this is stored seperately)
    
    chunk_size = 1048576 #1MB
    # chunk_size = 8096 #8KB
    # chunk_size = 1024 #1KB
    decoded_bytes_downloaded_this_session = 0
    start_time = datetime.datetime.now()
    if os.path.exists(local_filepath):
        decoded_bytes_downloaded = os.path.getsize(local_filepath)
    else:
        decoded_bytes_downloaded = 0
    with requests.Session() as s:
        with s.get(url, stream=True) as r:
            #check for required headers:
            if 'Content-Length' not in r.headers:
                print('STOP: request headers do not contain Content-Length')
                return
            if ('Accept-Ranges','bytes') not in r.headers.items():
                print('STOP: request headers do not contain Accept-Ranges: bytes')
                with s.get(url) as r:
                    print(str(r.content, encoding='iso-8859-1'))
                return
        content_length = int(r.headers['Content-Length'])
        if decoded_bytes_downloaded>=content_length:
                print('STOP: file already downloaded. decoded_bytes_downloaded>=r.headers[Content-Length]; {}>={}'.format(decoded_bytes_downloaded,r.headers['Content-Length']))
                return
        if decoded_bytes_downloaded>0:
            s.headers['Range'] = 'bytes={}-{}'.format(decoded_bytes_downloaded, content_length-1) #range is inclusive
            print('Retrieving byte range (inclusive) {}-{}'.format(decoded_bytes_downloaded, content_length-1))
        with s.get(url, stream=True) as r:
            r.raise_for_status()
            with open(local_filepath, mode='ab') as fwrite:
                for chunk in r.iter_content(chunk_size=chunk_size):
                    decoded_bytes_downloaded+=len(chunk)
                    decoded_bytes_downloaded_this_session+=len(chunk)
                    time_taken:datetime.timedelta = (datetime.datetime.now() - start_time)
                    seconds_per_byte = time_taken.total_seconds()/decoded_bytes_downloaded_this_session
                    remaining_bytes = content_length-decoded_bytes_downloaded
                    remaining_seconds = seconds_per_byte * remaining_bytes
                    remaining_time = datetime.timedelta(seconds=remaining_seconds)
                    #print updated statistics here
                    fwrite.write(chunk)
                    if not keep_going:
                        break

output_folder = '/mnt/HDD1TB/DownloadsBIG'

# url = 'https://file-examples.com/storage/fea508993d645be1b98bfcf/2017/10/file_example_JPG_100kB.jpg'
# url = 'https://file-examples.com/storage/fe563fce08645a90397f28d/2017/10/file_example_JPG_2500kB.jpg'
url = 'https://ftp.ncbi.nlm.nih.gov/blast/db/nr.00.tar.gz'

local_filepath = os.path.join(output_folder, os.path.split(url)[-1])

download_file(url, local_filepath)

Download large file in python with requests

9 Answers9

`requests` is good, but how about `socket` solution?

Linked

Related

Download large file in python with requests

9 Answers9

requests is good, but how about socket solution?

Linked

Related

`requests` is good, but how about `socket` solution?