63

I need to download a sizable (~200MB) file. I figured out how to download and save the file with here. It would be nice to have a progress bar to know how much has been downloaded. I found ProgressBar but I'm not sure how to incorperate the two together.

Here's the code I tried, but it didn't work.

bar = progressbar.ProgressBar(max_value=progressbar.UnknownLength)
with closing(download_file()) as r:
    for i in range(20):
        bar.update(i)
Community
  • 1
  • 1
Gamegoofs2
  • 813
  • 1
  • 8
  • 15

8 Answers8

145

I suggest you try tqdm, it's very easy to use. Example code for downloading with requests library:

from tqdm import tqdm
import requests

url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes= int(response.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data))
        file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
    print("ERROR, something went wrong")
Wasi Master
  • 1,112
  • 2
  • 11
  • 22
leovp
  • 4,528
  • 1
  • 20
  • 24
  • Not sure if I'm missing something, but that doesn't display a progress bar for me, just the digits (I think because tqdm doesn't know the total size?) – Juicy Jan 08 '17 at 16:27
  • 1
    Yeah turns out you need to get the total length and pass it as parameter: `total = int(r.headers.get('content-length')); ...tqdm(r.iter_content(),total=total)...` – Juicy Jan 08 '17 at 16:31
  • Thanks! And add `unit='B', unit_scale=True` to the `tqdm()` call for human readable stats... – Willem Feb 13 '17 at 16:39
  • 6
    This works quiet nicely! The only comment is that total_size = (int(r.headers.get('content-length', 0))/(32*1024)). This is because requests gets 32*1024 bytes at a time instead of 1 byte. – John Aug 19 '17 at 13:10
  • 4
    I'd add `with open('output.bin', 'wb') as f: with tqdm(total=total_size / (32*1024.0), unit='B', unit_scale=True, unit_divisor=1024) as pbar: for data in r.iter_content(32*1024): f.write(data); pbar.update(len(data))` – casper.dcl Jan 22 '18 at 18:54
  • I tried this code to download 1,5GB file and it always stops at 48.5k. Why? – Hrvoje T Feb 16 '18 at 20:59
  • In file manager it shows 1.5GB. So I checked md5 and the file was the same, copied correctly. So 48500 x 32 x 1024 = 1 589 248 000 = 1.5GB – Hrvoje T Feb 16 '18 at 21:09
  • 1
    @HrvojeT the problem comes from the `total_size//block_size` part. It looks like this is iterating over (and showing progress over) blocks, not files. – shadowtalker Aug 01 '18 at 13:58
  • Keep simple, `wrote += f.write(data)` – dizcza Oct 08 '18 at 07:39
  • I've packaged this into a tiny Python package, for my own use. It's up on GitHub and PyPI: https://github.com/shaypal5/tqdl – Shay Palachy Affek Oct 27 '19 at 14:21
  • @casper.dcl replace "with tqdm(total=total_size / (32*1024.0), ...)" with "with tqdm(total=total_size, ...)" works for me. It looks like you don't have to divide the factor. – R. Yang Mar 06 '20 at 08:44
  • `content-length` reflects size of sent entity (often gzipped), while `len(data)` reflects size of already unzipped data, so this may not necessarily be true when data are gzipped by the server. – dzieciou Mar 25 '20 at 15:43
26

The tqdm package now includes a function designed more specifically for this type of situation: wrapattr. You just wrap an object's read (or write) attribute, and tqdm handles the rest; there's no messing with block sizes or anything like that. Here's a simple download function that puts it all together with requests:

def download(url, filename):
    import functools
    import pathlib
    import shutil
    import requests
    from tqdm.auto import tqdm
    
    r = requests.get(url, stream=True, allow_redirects=True)
    if r.status_code != 200:
        r.raise_for_status()  # Will only raise for 4xx codes, so...
        raise RuntimeError(f"Request to {url} returned status code {r.status_code}")
    file_size = int(r.headers.get('Content-Length', 0))

    path = pathlib.Path(filename).expanduser().resolve()
    path.parent.mkdir(parents=True, exist_ok=True)

    desc = "(Unknown total file size)" if file_size == 0 else ""
    r.raw.read = functools.partial(r.raw.read, decode_content=True)  # Decompress if needed
    with tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
        with path.open("wb") as f:
            shutil.copyfileobj(r_raw, f)

    return path
Mike
  • 19,114
  • 12
  • 59
  • 91
  • 1
    This is a fantastic answer! One quick note. If you run this from VS Code 1.71 under WSL, TQDM keeps scrolling up. If you run it from a WSL command prompt, works great. I suspect there is a setting you need to apply when running directly from VS Code if the output is the integrated terminal window. – ejkitchen Sep 08 '22 at 19:01
6

It seems that there is a disconnect between the examples on the Progress Bar Usage page and what the code actually requires.

In the following example, note the use of maxval instead of max_value. Also note the use of .start() to initialized the bar. This has been noted in an Issue.

The n_chunk parameter denotes how many 1024 kb chunks to stream at once while looping through the request iterator.

import requests
import time

import numpy as np

import progressbar


url = "http://wikipedia.com/"

def download_file(url, n_chunk=1):
    r = requests.get(url, stream=True)
    # Estimates the number of bar updates
    block_size = 1024
    file_size = int(r.headers.get('Content-Length', None))
    num_bars = np.ceil(file_size / (n_chunk * block_size))
    bar =  progressbar.ProgressBar(maxval=num_bars).start()
    with open('test.html', 'wb') as f:
        for i, chunk in enumerate(r.iter_content(chunk_size=n_chunk * block_size)):
            f.write(chunk)
            bar.update(i+1)
            # Add a little sleep so you can see the bar progress
            time.sleep(0.05)
    return

download_file(url)

EDIT: Addressed comment about code clarity.
EDIT2: Fixed logic so bar reports 100% at completion. Credit to leovp's answer for using the 1024 kb block size.

andrew
  • 3,929
  • 1
  • 25
  • 38
5

Also python library enlighten can be used, it is powerful, provides colorful progress bars and correctly works in Linux, Windows.

Below is code + live screen-cast. This code can be run here on repl.it.

import math
import requests, enlighten

url = 'https://upload.wikimedia.org/wikipedia/commons/a/ae/Arthur_Streeton_-_Fire%27s_on_-_Google_Art_Project.jpg?download'
fname = 'image.jpg'

# Should be one global variable
MANAGER = enlighten.get_manager()

r = requests.get(url, stream = True)
assert r.status_code == 200, r.status_code
dlen = int(r.headers.get('Content-Length', '0')) or None

with MANAGER.counter(color = 'green', total = dlen and math.ceil(dlen / 2 ** 20), unit = 'MiB', leave = False) as ctr, \
     open(fname, 'wb', buffering = 2 ** 24) as f:
    for chunk in r.iter_content(chunk_size = 2 ** 20):
        print(chunk[-16:].hex().upper())
        f.write(chunk)
        ctr.update()

Output (+ ascii-video)

ascii

Arty
  • 14,883
  • 6
  • 36
  • 69
3

There is an answer with tqdm.

def download(url, fname):
    resp = requests.get(url, stream=True)
    total = int(resp.headers.get('content-length', 0))
    with open(fname, 'wb') as file, tqdm(
            desc=fname,
            total=total,
            unit='iB',
            unit_scale=True,
            unit_divisor=1024,
    ) as bar:
        for data in resp.iter_content(chunk_size=1024):
            size = file.write(data)
            bar.update(size)

Gits: https://gist.github.com/yanqd0/c13ed29e29432e3cf3e7c38467f42f51

Yan QiDong
  • 3,696
  • 1
  • 24
  • 25
2

It seems like you're going to need to get the remote file size (answered here) to calculate how far along you are.

You could then update your progress bar while processing each chunk... if you know the total size and the size of the chunk, you can figure out when to update the progress bar.

Community
  • 1
  • 1
1

Calculating the file size with your already downloaded size would find how far you are. Or you could use tqdm.

Hxdfgs
  • 19
  • 2
0

For some reason I couldn't get file size with requests when working with zip files, so I used urllib to get it

# A simple downloader with progress bar

import requests
from tqdm import tqdm
import zipfile
from urllib.request import urlopen

url = "https://web.cs.dal.ca/~juanr/downloads/malnis_dataset.zip"
block_size = 1024 #1 Kibibyte

filename = url.split("/")[-1]
print(f"Downloading {filename}...")
site = urlopen(url)
meta = site.info()
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes = int(meta["Content-Length"])
progress_bar = tqdm(total = total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data))
        file.write(data)
progress_bar.close()
print("Download complete")
print(f"Extracting {filename}...")
zip = zipfile.ZipFile(filename, "r")
zip.extractall()
zip.close()
print("Extracting complete")