Python progress bar and downloads

Question

I have a Python script that launches a URL that is a downloadable file. Is there some way to have Python display the download progress as oppose to launching the browser?

I'm probably late but you can use this library that is exactly what you want: https://pypi.org/project/Pretty-Downloader/0.0.2/ — DeadSec, Feb 26 '21 at 19:02
I'm surprised that [tqdm](https://pypi.python.org/pypi/tqdm) has not been suggested! [![enter image description here](https://i.stack.imgur.com/eJ21m.gif)](https://i.stack.imgur.com/eJ21m.gif) — kekkler, Sep 01 '17 at 07:07
Something @DeadSec maybe should have disclosed is that they are the maintainer of said library — Enok82, Aug 18 '23 at 12:02

score 155 · Answer 1 · edited Jun 12 '21 at 09:20

155

I've just written a super simple (slightly hacky) approach to this for scraping PDFs off a certain site. Note, it only works correctly on Unix based systems (Linux, mac os) as PowerShell does not handle "\r":

import sys
import requests

link = "http://indy/abcde1245"
file_name = "download.data"
with open(file_name, "wb") as f:
    print("Downloading %s" % file_name)
    response = requests.get(link, stream=True)
    total_length = response.headers.get('content-length')

    if total_length is None: # no content length header
        f.write(response.content)
    else:
        dl = 0
        total_length = int(total_length)
        for data in response.iter_content(chunk_size=4096):
            dl += len(data)
            f.write(data)
            done = int(50 * dl / total_length)
            sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50-done)) )    
            sys.stdout.flush()

It uses the requests library so you'll need to install that. This outputs something like the following into your console:

>Downloading download.data

>[============= ]

The progress bar is 52 characters wide in the script (2 characters are simply the [] so 50 characters of progress). Each = represents 2% of the download.

edited Jun 12 '21 at 09:20

Tomerikoo

18,379
16
47
61

answered Mar 26 '13 at 18:50

Endophage

21,038
13
59
90

requests is not defined anywhere – user1607549 Mar 26 '13 at 19:12
I tried initialized it as requests = {} but that still didn't fix it. Not sure why? – user1607549 Mar 26 '13 at 19:39
@user1607549 from above "It uses the requests library so you'll need to install that." (sudo pip install requests) then `import requests` – Endophage Mar 26 '13 at 19:50
What is `pdf` in this example? I understand you're downloading pdf files, but is pdf a module? – EML Mar 26 '13 at 20:30
1

I have the same question, what is pdf? – user1607549 Mar 26 '13 at 21:07
@EML sorry, literally copy pasted out of my own script, renamed some variables to make it more generic, missed that one. I'll fix now, it was just a variable name. – Endophage Mar 26 '13 at 21:56
2

You may want to define chunk_size in iter_content so it won't be so slow. – 0942v8653 Jan 05 '15 at 18:39
3

As @0942v8653 mentions, iter_content() takes a chunk_size so you can specify it for speed, but also if the content you are downloading is small enough that ~ 1% of it can fit in memory, you could simplify your code alot by doing chunk_size=total_length/100 and each iteration of the loop would be 1% of your download – cnelson Mar 13 '15 at 13:52
2

Worked for me on Windows. Also changed one line from `for data in response.iter_content():` to `for data in response.iter_content(chunk_size=total_length/100):`. – mrgloom May 06 '16 at 09:52
@mrgloom's solutions also has one other advantage: if you don't define `chunk_size` it will be veeeeery slow (also @0942v8653 said this and here is a related GH issue: https://github.com/kennethreitz/requests/issues/2015). So always define chunk size to be something like 4096. – Grey Panther Sep 06 '16 at 04:17
Perfect. No additional package import needed. – fotinsky Dec 14 '17 at 13:31
It wasn't working until I've changed the `response = requests.get(link, stream=True)` bit into `response = requests.get(link + "/" + file_name, stream=True)`. As it is, it just downloads the webpage (for me at least). – nnsense Dec 25 '18 at 13:58
It's only sample code. The file_name is the local file name, the link is the URL to the file. If your URL happens to end with the the same thing you want the local file to be called, that's incidental to your situation. – Endophage Jan 02 '19 at 08:50
Thanks, this was really useful. I suggest using `'\N{full block}'` instead of `'='` to make the progress bar look professional. That's what I did. But I admit, you have to be really smart to be able to come up with code like this. I have a question too. Is `sys.stdout.write` the same as `print` with `end=''`? – Pyzard Aug 10 '20 at 02:58
@Pyzard `print` has a bunch of parameters that make your life easier, like simply passing in an `int` or anything else and printing out many things at once. But basically, yes. – David Jul 12 '21 at 06:49
1

From what I know the command line on windows 10 does support `\r` – Minek Po1 Dec 21 '21 at 10:19
@MinekPo1 Exactly, `\r` works on my windows 10 machine, both in `cmd` *and* in powershell. For example: `for i in range(n): print(f'\r{i/n*100:.1f}%', end='')` works like a charm. – djvg Jul 12 '22 at 16:14
[stdout.write vs print](https://stackoverflow.com/q/3263672) – djvg Jul 12 '22 at 16:19

score 83 · Answer 2 · edited Apr 20 '23 at 16:23

83

You can use the 'clint' package (written by the same author as 'requests') to add a simple progress bar to your downloads like this:

import requests
from clint.textui import progress

r = requests.get(url, stream=True)
path = '/some/path/for/file.txt'
with open(path, 'wb') as f:
    total_length = int(r.headers.get('content-length'))
    for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length/1024) + 1): 
        if chunk:
            f.write(chunk)
            f.flush()

which will give you a dynamic output which will look like this:

[################################] 5210/5210 - 00:00:01

It should work on multiple platforms as well! You can also change the bar to dots or a spinner with .dots and .mill instead of .bar.

Enjoy!

edited Apr 20 '23 at 16:23

Lampe2020

115
1
12

answered Jan 06 '14 at 05:19

Rich Jones

1,422
1
15
17

2

it would be great if this can be a part of python standard library. – Ciasto piekarz Aug 16 '14 at 08:16
path is a filename you want to save the file. – tshrinivasan Dec 06 '15 at 16:53
path = "filename.ext" – tshrinivasan Dec 06 '15 at 16:55
4

Clint has now been discontinued – mrid Jul 16 '19 at 10:11
1

Commenting for when I inevitably want to return to this - this is great! – scubbo Oct 31 '19 at 05:48
Great. You could grab filename with re.findall('filename=(.+)', r.headers.get('content-disposition')) – Ehsan Ahmadi Feb 10 '21 at 16:30

score 61 · Answer 3 · answered Dec 20 '18 at 23:44

Python 3 with TQDM

This is the suggested technique from the TQDM docs.

import urllib.request

from tqdm import tqdm


class DownloadProgressBar(tqdm):
    def update_to(self, b=1, bsize=1, tsize=None):
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)


def download_url(url, output_path):
    with DownloadProgressBar(unit='B', unit_scale=True,
                             miniters=1, desc=url.split('/')[-1]) as t:
        urllib.request.urlretrieve(url, filename=output_path, reporthook=t.update_to)

This is by far the best one. – Amit Kharel Jul 08 '20 at 10:00 — Amit Kharel, Jul 08 '20 at 10:00

score 33 · Answer 4 · edited Apr 19 '22 at 19:57

There is an answer with requests and tqdm.

import requests
from tqdm import tqdm


def download(url: str, fname: str):
    resp = requests.get(url, stream=True)
    total = int(resp.headers.get('content-length', 0))
    # Can also replace 'file' with a io.BytesIO object
    with open(fname, 'wb') as file, tqdm(
        desc=fname,
        total=total,
        unit='iB',
        unit_scale=True,
        unit_divisor=1024,
    ) as bar:
        for data in resp.iter_content(chunk_size=1024):
            size = file.write(data)
            bar.update(size)

Gist: https://gist.github.com/yanqd0/c13ed29e29432e3cf3e7c38467f42f51

score 11 · Answer 5 · answered Mar 31 '21 at 23:09

Another good option is wget:

import wget
wget.download('http://download.geonames.org/export/zip/US.zip')

The output will look like this:

11% [........                                     ] 73728 / 633847

Source: https://medium.com/@petehouston/download-files-with-progress-in-python-96f14f6417a2

score 9 · Answer 6 · edited Jun 12 '21 at 09:23

9

You can also use click. It has a good library for progress bar:

import click

with click.progressbar(length=total_size, label='Downloading files') as bar:
    for file in files:
        download(file)
        bar.update(file.size)

edited Jun 12 '21 at 09:23

Tomerikoo

18,379
16
47
61

answered Jun 23 '16 at 14:50

Tian Zhang

343
2
4

3

@MortenB Is it? I get `ModuleNotFoundError: No module named 'click'` on 3.6.1. – alexia Aug 12 '17 at 09:29
It's a 2rd party library that needs to be installed – AbdealiLoKo Aug 16 '19 at 11:43
2

@AbdealiJK 3rd party – Smart Manoj Aug 18 '19 at 05:20
What is "total_size"? – WhatTheClown Feb 13 '21 at 12:24
@MortenB Do `pip install click` first, then execute the code – Rid Jul 13 '21 at 10:25

casper.dcl · Answer 7 · 2020-05-03T14:19:42.163

Sorry for being late with an answer; just updated the tqdm docs:

https://github.com/tqdm/tqdm/#hooks-and-callbacks

Using urllib.urlretrieve and OOP:

import urllib
from tqdm.auto import tqdm

class TqdmUpTo(tqdm):
    """Provides `update_to(n)` which uses `tqdm.update(delta_n)`."""
    def update_to(self, b=1, bsize=1, tsize=None):
        """
        b  : Blocks transferred so far
        bsize  : Size of each block
        tsize  : Total size
        """
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)  # will also set self.n = b * bsize

eg_link = "https://github.com/tqdm/tqdm/releases/download/v4.46.0/tqdm-4.46.0-py2.py3-none-any.whl"
eg_file = eg_link.split('/')[-1]
with TqdmUpTo(unit='B', unit_scale=True, unit_divisor=1024, miniters=1,
              desc=eg_file) as t:  # all optional kwargs
    urllib.urlretrieve(
        eg_link, filename=eg_file, reporthook=t.update_to, data=None)
    t.total = t.n

or using requests.get and file wrappers:

import requests
from tqdm.auto import tqdm

eg_link = "https://github.com/tqdm/tqdm/releases/download/v4.46.0/tqdm-4.46.0-py2.py3-none-any.whl"
eg_file = eg_link.split('/')[-1]
response = requests.get(eg_link, stream=True)
with tqdm.wrapattr(open(eg_file, "wb"), "write", miniters=1,
                   total=int(response.headers.get('content-length', 0)),
                   desc=eg_file) as fout:
    for chunk in response.iter_content(chunk_size=4096):
        fout.write(chunk)

You could of course mix & match techniques.

score 4 · Answer 8 · edited Jun 12 '21 at 09:27

# Define Progress Bar function

def print_progressbar(total, current, barsize=60):
    progress = int(current*barsize/total)
    completed = str(int(current*100/total)) + '%'
    print('[', chr(9608)*progress, ' ', completed, '.'*(barsize-progress), '] ', str(i)+'/'+str(total), sep='', end='\r', flush=True)

# Sample Code

total = 6000
barsize = 60
print_frequency = max(min(total//barsize, 100), 1)
print("Start Task..", flush=True)
for i in range(1, total+1):
  if i%print_frequency == 0 or i == 1:
    print_progressbar(total, i, barsize)
print("\nFinished", flush=True)

# Snapshot of Progress Bar :

Below lines are for illustrations only. In command prompt you will see single progress bar showing incremental progress.

[ 0%............................................................] 1/6000

[██████████ 16%..................................................] 1000/6000

[████████████████████ 33%........................................] 2000/6000

[██████████████████████████████ 50%..............................] 3000/6000

[████████████████████████████████████████ 66%....................] 4000/6000

[██████████████████████████████████████████████████ 83%..........] 5000/6000

[████████████████████████████████████████████████████████████ 100%] 6000/6000

score 4 · Answer 9 · answered Sep 10 '20 at 13:54

The tqdm package now includes a function designed to handle exactly this type of situation: wrapattr. You just wrap an object's read (or write) attribute, and tqdm handles the rest. Here's a simple download function that puts it all together with requests:

def download(url, filename):
    import functools
    import pathlib
    import shutil
    import requests
    import tqdm
    
    r = requests.get(url, stream=True, allow_redirects=True)
    if r.status_code != 200:
        r.raise_for_status()  # Will only raise for 4xx codes, so...
        raise RuntimeError(f"Request to {url} returned status code {r.status_code}")
    file_size = int(r.headers.get('Content-Length', 0))

    path = pathlib.Path(filename).expanduser().resolve()
    path.parent.mkdir(parents=True, exist_ok=True)

    desc = "(Unknown total file size)" if file_size == 0 else ""
    r.raw.read = functools.partial(r.raw.read, decode_content=True)  # Decompress if needed
    with tqdm.tqdm.wrapattr(r.raw, "read", total=file_size, desc=desc) as r_raw:
        with path.open("wb") as f:
            shutil.copyfileobj(r_raw, f)

    return path

Ehsan Ahmadi · Answer 10 · 2021-02-10T17:05:33.827

Just some improvements of @rich-jones's answer

 import re
 import request
 from clint.textui import progress

 def get_filename(cd):
    """
    Get filename from content-disposition
    """
    if not cd:
        return None
    fname = re.findall('filename=(.+)', cd)
    if len(fname) == 0:
        return None
    return fname[0].replace('"', "")

def stream_download_file(url, output, chunk_size=1024, session=None, verbose=False):
    
    if session:
        file = session.get(url, stream=True)
    else:
        file = requests.get(url, stream=True)
        
    file_name = get_filename(file.headers.get('content-disposition'))
    filepath = "{}/{}".format(output, file_name)
    
    if verbose: 
        print ("Downloading {}".format(file_name))
        
    with open(filepath, 'wb') as f:
        total_length = int(file.headers.get('content-length'))
        for chunk in progress.bar(file.iter_content(chunk_size=chunk_size), expected_size=(total_length/chunk_size) + 1): 
            if chunk:
                f.write(chunk)
                f.flush()
    if verbose: 
        print ("Finished")

Han Zhang · Answer 11 · 2022-11-28T08:43:49.647

I come up with a solution that looks a bit nicer based on tqdm. My implementation is based on the answer of @Endophage.

The effect:

# import the download_file definition from the next cell first.
>>> download_file(url, 'some_data.dat')
Downloading some_data.dat.
  7%|█▎                  | 195.31MB/2.82GB:  [00:04<01:02, 49.61MB/s]

The implementation:

import time
import math
import requests
from tqdm import tqdm


def download_file(url, filename, update_interval=500, chunk_size=4096):
    def memory2str(mem):
        sizes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
        power = int(math.log(mem, 1024))
        size = sizes[power]
        for _ in range(power):
            mem /= 1024
        if power > 0:
            return f'{mem:.2f}{size}'
        else:
            return f'{mem}{size}'
    with open(filename, 'wb') as f:
        response = requests.get(url, stream=True)
        total_length = response.headers.get('content-length')
        if total_length is None:
            f.write(response.content)
        else:
            print(f'Downloading {filename}.', flush=True)
            downloaded, total_length = 0, int(total_length)
            total_size = memory2str(total_length)
            bar_format = '{percentage:3.0f}%|{bar:20}| {desc} [{elapsed}<{remaining}' \
                         '{postfix}]'
            if update_interval * chunk_size * 100 >= total_length:
                update_interval = 1
            with tqdm(total=total_length, bar_format=bar_format) as bar:
                counter = 0
                now_time, now_size = time.time(), downloaded
                for data in response.iter_content(chunk_size=chunk_size):
                    f.write(data)
                    downloaded += len(data)
                    counter += 1
                    bar.update(len(data))
                    if counter % update_interval == 0:
                        ellapsed = time.time() - now_time
                        runtime_downloaded = downloaded - now_size
                        now_time, now_size = time.time(), downloaded

                        cur_size = memory2str(downloaded)
                        speed_size = memory2str(runtime_downloaded / ellapsed)
                        bar.set_description(f'{cur_size}/{total_size}')
                        bar.set_postfix_str(f'{speed_size}/s')

                        counter = 0

score 0 · Answer 12 · answered Jan 17 '23 at 21:34

Simple solution with wget and tqdm python libraries that shows progress in megabytes and remaining time:

MB:  37%|███▋      | 2044.8/5588.7 [02:57<04:30, 13.11it/s]

Install libraries pip3 install wget tqdm
Import libraries
```
import wget
from tqdm import tqdm
```

Wrapper class for tqdm

class ProgressBar:

  def __init__(self):
      self.progress_bar = None

  def __call__(self, current_bytes, total_bytes, width):
      current_mb = round(current_bytes / 1024 ** 2, 1)
      total_mb = round(total_bytes / 1024 ** 2, 1)
      if self.progress_bar is None:
          self.progress_bar = tqdm(total=total_mb, desc="MB")
      delta_mb = current_mb - self.progress_bar.n
      self.progress_bar.update(delta_mb)

How to use it

wget.download(url, dst_filepath, ProgressBar())

score 0 · Answer 13 · answered Jan 30 '23 at 22:42

Here is the "Goat Progress bar" implementation from George Hotz.

r = requests.get(url, stream=True)
progress_bar = tqdm(total=int(r.headers.get('content-length', 0)), unit='B', unit_scale=True, desc=url)
dat = b''.join(x for x in r.iter_content(chunk_size=16384) if progress_bar.update(len(x)) or True)

cc: https://github.com/geohot/tinygrad/commit/7118602c976d264d97af3c1c8b97d72077616d07

score 0 · Answer 14 · answered Jun 07 '23 at 23:43

You can easily use the dlbar module:

python3 -m pip install dlbar

Just import it and call the download method:

from dlbar import DownloadBar

download_bar = DownloadBar()

download_bar.download(
    url='https://url',
    dest='/a/b/c/downloaded_file.suffix',
    title='Downloading downloaded_file.suffix'
)

Output:

Downloading downloaded_file.suffix
43% █████████████████████----------------------------- 197.777 MB/450.327 MB

You can also customize the download bar. See here for more information.

score 0 · Answer 15 · answered Jun 11 '23 at 12:50

I modified the many great suggestions to suit my situation.

I needed to download a large .txt file (>2.5 GB). Each line in the text file contains a unique paragraph. And hence I needed to retrieve a list of paragraphs from the file.

Be aware that the following code is not 100% bulletproof. This is because the chunks might not be exactly at the end/beginning of a paragraph, resulting in paragraphs being split into two. However, in my case, that was not an issue. Increasing the chunk_size will reduce the number of "corrupt" paragraphs.

import requests
from tqdm import tqdm

     def DownloadFile(url):      
        req = requests.get(url, stream=True)
        total_length = int(req.headers.get('content-length'))
        chunk_size = 4194304 # 4Mb
        steps = total_length / chunk_size
        data = []
        for chunk in tqdm(req.iter_content(chunk_size=chunk_size), total=steps):
            text = chunk.decode("utf-8", "ignore") 
            for line in text.split("\n"):
                data.append(line.rstrip())
        return data

score -1 · Answer 16 · answered Oct 22 '16 at 06:36

You can stream a downloads as it is here -> Stream a Download.

Also you can Stream Uploads.

The most important streaming a request is done unless you try to access the response.content with just 2 lines

for line in r.iter_lines():    
    if line:
        print(line)

Stream Requests

Python progress bar and downloads

16 Answers16

Linked