Progress Bar while download file over http with Requests

Question

I need to download a sizable (~200MB) file. I figured out how to download and save the file with here. It would be nice to have a progress bar to know how much has been downloaded. I found ProgressBar but I'm not sure how to incorperate the two together.

Here's the code I tried, but it didn't work.

bar = progressbar.ProgressBar(max_value=progressbar.UnknownLength)
with closing(download_file()) as r:
    for i in range(20):
        bar.update(i)

Interesting. We may be using different versions. I get an "unexpected keyword argument 'max_value' " when I copy/paste your first line. I am using version 2.3. — andrew, Jun 01 '16 at 18:15

score 145 · Accepted Answer · edited Sep 14 '21 at 09:48

145

I suggest you try tqdm, it's very easy to use. Example code for downloading with requests library:

from tqdm import tqdm
import requests

url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes= int(response.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data))
        file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
    print("ERROR, something went wrong")

edited Sep 14 '21 at 09:48

Wasi Master

1,112
2
11
22

answered Jun 01 '16 at 16:04

leovp

4,528
1
20
24

Not sure if I'm missing something, but that doesn't display a progress bar for me, just the digits (I think because tqdm doesn't know the total size?) – Juicy Jan 08 '17 at 16:27
1

Yeah turns out you need to get the total length and pass it as parameter: `total = int(r.headers.get('content-length')); ...tqdm(r.iter_content(),total=total)...` – Juicy Jan 08 '17 at 16:31
Thanks! And add `unit='B', unit_scale=True` to the `tqdm()` call for human readable stats... – Willem Feb 13 '17 at 16:39
6

This works quiet nicely! The only comment is that total_size = (int(r.headers.get('content-length', 0))/(32*1024)). This is because requests gets 32*1024 bytes at a time instead of 1 byte. – John Aug 19 '17 at 13:10
4

I'd add `with open('output.bin', 'wb') as f: with tqdm(total=total_size / (32*1024.0), unit='B', unit_scale=True, unit_divisor=1024) as pbar: for data in r.iter_content(32*1024): f.write(data); pbar.update(len(data))` – casper.dcl Jan 22 '18 at 18:54
I tried this code to download 1,5GB file and it always stops at 48.5k. Why? – Hrvoje T Feb 16 '18 at 20:59
In file manager it shows 1.5GB. So I checked md5 and the file was the same, copied correctly. So 48500 x 32 x 1024 = 1 589 248 000 = 1.5GB – Hrvoje T Feb 16 '18 at 21:09
1

@HrvojeT the problem comes from the `total_size//block_size` part. It looks like this is iterating over (and showing progress over) blocks, not files. – shadowtalker Aug 01 '18 at 13:58
Keep simple, `wrote += f.write(data)` – dizcza Oct 08 '18 at 07:39
I've packaged this into a tiny Python package, for my own use. It's up on GitHub and PyPI: https://github.com/shaypal5/tqdl – Shay Palachy Affek Oct 27 '19 at 14:21
@casper.dcl replace "with tqdm(total=total_size / (32*1024.0), ...)" with "with tqdm(total=total_size, ...)" works for me. It looks like you don't have to divide the factor. – R. Yang Mar 06 '20 at 08:44
`content-length` reflects size of sent entity (often gzipped), while `len(data)` reflects size of already unzipped data, so this may not necessarily be true when data are gzipped by the server. – dzieciou Mar 25 '20 at 15:43

Mike · Answer 2 · 2020-09-15T02:29:13.207

The tqdm package now includes a function designed more specifically for this type of situation: wrapattr. You just wrap an object's read (or write) attribute, and tqdm handles the rest; there's no messing with block sizes or anything like that. Here's a simple download function that puts it all together with requests:

def download(url, filename):
    import functools
    import pathlib
    import shutil
    import requests
    from tqdm.auto import tqdm
    
    r = requests.get(url, stream=True, allow_redirects=True)
    if r.status_code != 200:
        r.raise_for_status()  # Will only raise for 4xx codes, so...
        raise RuntimeError(f"Request to {url} returned status code {r.status_code}")
    file_size = int(r.headers.get('Content-Length', 0))

    path = pathlib.Path(filename).expanduser().resolve()
    path.parent.mkdir(parents=True, exist_ok=True)

    desc = "(Unknown total file size)" if file_size == 0 else ""
    r.raw.read = functools.partial(r.raw.read, decode_content=True)  # Decompress if needed
    with tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
        with path.open("wb") as f:
            shutil.copyfileobj(r_raw, f)

    return path

This is a fantastic answer! One quick note. If you run this from VS Code 1.71 under WSL, TQDM keeps scrolling up. If you run it from a WSL command prompt, works great. I suspect there is a setting you need to apply when running directly from VS Code if the output is the integrated terminal window. — ejkitchen, Sep 08 '22 at 19:01

andrew · Answer 3 · 2020-04-10T19:59:45.130

It seems that there is a disconnect between the examples on the Progress Bar Usage page and what the code actually requires.

In the following example, note the use of maxval instead of max_value. Also note the use of .start() to initialized the bar. This has been noted in an Issue.

The n_chunk parameter denotes how many 1024 kb chunks to stream at once while looping through the request iterator.

import requests
import time

import numpy as np

import progressbar


url = "http://wikipedia.com/"

def download_file(url, n_chunk=1):
    r = requests.get(url, stream=True)
    # Estimates the number of bar updates
    block_size = 1024
    file_size = int(r.headers.get('Content-Length', None))
    num_bars = np.ceil(file_size / (n_chunk * block_size))
    bar =  progressbar.ProgressBar(maxval=num_bars).start()
    with open('test.html', 'wb') as f:
        for i, chunk in enumerate(r.iter_content(chunk_size=n_chunk * block_size)):
            f.write(chunk)
            bar.update(i+1)
            # Add a little sleep so you can see the bar progress
            time.sleep(0.05)
    return

download_file(url)

EDIT: Addressed comment about code clarity.
EDIT2: Fixed logic so bar reports 100% at completion. Credit to leovp's answer for using the 1024 kb block size.

None the way I originally wrote it! Thanks for catching this. — andrew, Apr 10 '20 at 19:08

Arty · Answer 4 · 2021-01-25T09:29:28.377

Also python library enlighten can be used, it is powerful, provides colorful progress bars and correctly works in Linux, Windows.

Below is code + live screen-cast. This code can be run here on repl.it.

import math
import requests, enlighten

url = 'https://upload.wikimedia.org/wikipedia/commons/a/ae/Arthur_Streeton_-_Fire%27s_on_-_Google_Art_Project.jpg?download'
fname = 'image.jpg'

# Should be one global variable
MANAGER = enlighten.get_manager()

r = requests.get(url, stream = True)
assert r.status_code == 200, r.status_code
dlen = int(r.headers.get('Content-Length', '0')) or None

with MANAGER.counter(color = 'green', total = dlen and math.ceil(dlen / 2 ** 20), unit = 'MiB', leave = False) as ctr, \
     open(fname, 'wb', buffering = 2 ** 24) as f:
    for chunk in r.iter_content(chunk_size = 2 ** 20):
        print(chunk[-16:].hex().upper())
        f.write(chunk)
        ctr.update()

Output (+ ascii-video)

Yan QiDong · Answer 5 · 2020-07-30T09:25:37.743

There is an answer with tqdm.

def download(url, fname):
    resp = requests.get(url, stream=True)
    total = int(resp.headers.get('content-length', 0))
    with open(fname, 'wb') as file, tqdm(
            desc=fname,
            total=total,
            unit='iB',
            unit_scale=True,
            unit_divisor=1024,
    ) as bar:
        for data in resp.iter_content(chunk_size=1024):
            size = file.write(data)
            bar.update(size)

Gits: https://gist.github.com/yanqd0/c13ed29e29432e3cf3e7c38467f42f51

score 2 · Answer 6 · edited May 23 '17 at 11:54

2

It seems like you're going to need to get the remote file size (answered here) to calculate how far along you are.

You could then update your progress bar while processing each chunk... if you know the total size and the size of the chunk, you can figure out when to update the progress bar.

edited May 23 '17 at 11:54

Community

1
1

answered Jun 01 '16 at 16:08

Paul Ellsworth

106
3

score 1 · Answer 7 · answered Sep 15 '20 at 02:40

1

Calculating the file size with your already downloaded size would find how far you are. Or you could use tqdm.

answered Sep 15 '20 at 02:40

Hxdfgs

19
2

score 0 · Answer 8 · answered Dec 09 '21 at 02:19

For some reason I couldn't get file size with requests when working with zip files, so I used urllib to get it

# A simple downloader with progress bar

import requests
from tqdm import tqdm
import zipfile
from urllib.request import urlopen

url = "https://web.cs.dal.ca/~juanr/downloads/malnis_dataset.zip"
block_size = 1024 #1 Kibibyte

filename = url.split("/")[-1]
print(f"Downloading {filename}...")
site = urlopen(url)
meta = site.info()
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes = int(meta["Content-Length"])
progress_bar = tqdm(total = total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data))
        file.write(data)
progress_bar.close()
print("Download complete")
print(f"Extracting {filename}...")
zip = zipfile.ZipFile(filename, "r")
zip.extractall()
zip.close()
print("Extracting complete")

Progress Bar while download file over http with Requests

8 Answers8

Linked