Extract ZipFile using Python, display Progress Percentage?

Question

I know how to extract a zip archive using Python, but how exactly do I display the progress of that extraction in a percentage?

Tkinter, if thats what your talking about. All I need is to be able to display the Text percentage. — Zac Brown, Dec 03 '10 at 01:06
A somewhat dirty workaround is to spawn the extraction in a separate process, monitor the files being extracted from the main thread, sum their sizes and divide by `ZipInfo.file_size` — Novikov, Dec 03 '10 at 01:26

Anwarvic · Answer 1 · 2021-10-04T16:43:51.500

18

I suggest using tqdm, you can install it using pip like so:

pip install tqdm

Then, you can use it directly like so:

>>> from tqdm import tqdm
>>>
>>> with zipfile.ZipFile(some_source) as zf:
...     for member in tqdm(zf.infolist(), desc='Extracting '):
...         try:
...             zf.extract(member, target_path)
...         except zipfile.error as e:
...             pass

This will produce something like so:

Extracting : 100%|██████████| 60.0k/60.0k [14:56<00:00, 66.9File/s]

edited Oct 04 '21 at 16:43

answered Jul 10 '19 at 12:19

Anwarvic

12,156
4
49
69

Is it possible to get the percent value from tqdm, so that I can implement my progress bar like a windows GUI by updating its value? i.e., To display custom progress for the extraction of a zip file. – Phani Rithvij Mar 01 '20 at 09:48
1

If you want just the percentage, then you don't need `tqdm`. You can find the percentage easily knowing the file index. For example, assume that you have 100 files, and now you are extracting the 4th file, then you are done by 4%. Simple!! – Anwarvic Mar 01 '20 at 10:07
3

@Anwarvic if I have 1 100MiB file and 99 few KiB files then that's not how that works at all... – Yamirui Sep 20 '20 at 20:54
Thanks @Anwarvic! Also you could import it like this: **from tqdm import tqdm**. – Pedro P. Camellon Oct 04 '21 at 14:42
@PedroP.CamellónQ. You're absolutely right! I edited my answer – Anwarvic Oct 04 '21 at 16:43
I used this variation today: ```for member in tqdm(iterable=zf.namelist(), total=len(zf.namelist()), desc='Extracting '):``` – Pedro P. Camellon Oct 05 '21 at 05:46
@PedroP.CamellónQ. please, can you explain the difference with your variation?? To whoever may relate to it, I notice that using `extractall()` instead of `extract()` won't update the progress bar. – Selfcontrol7 Feb 04 '22 at 04:34
1

`extractall()` extracts all content of the zipfile while `extract` extracts just one member. So, this is how the code works. 1) Read the zip file. 2) iterate over all members using the `infolist()` method 3) extract each member while updating the progress bar. And if there is an error, ignore it. I hope this helps – Anwarvic Feb 04 '22 at 11:56

Dan D. · Accepted Answer · 2010-12-03T02:21:46.397

9

the extract method doesn't provide a call back for this so one would have to use getinfo to get the e uncompressed size and then open the file read from it in blocks and write it to the place you want the file to go and update the percentage one would also have to restore the mtime if that is wanted an example:

import zipfile
z = zipfile.ZipFile(some_source)
entry_info = z.getinfo(entry_name)
i = z.open(entry_name)
o = open(target_name, 'w')
offset = 0
while True:
    b = i.read(block_size)
    offset += len(b)
    set_percentage(float(offset)/float(entry_info.file_size) * 100.)
    if b == '':
        break
    o.write(b)
i.close()
o.close()
set_attributes_from(entry_info)

this extracts entry_name to target_name

most of this is also done by shutil.copyfileobj but it doesn't have a call back for progress either

the source of the ZipFile.extract method calls _extract_member uses:

source = self.open(member, pwd=pwd)
target = file(targetpath, "wb")
shutil.copyfileobj(source, target)
source.close()
target.close()

where member has be converted from a name to a ZipInfo object by getinfo(member) if it wasn't a ZipInfo object

edited Dec 03 '10 at 02:21

answered Dec 03 '10 at 01:29

Dan D.

73,243
15
104
123

OK, cool. I like this. Only thing is, my archive contains folders, but for some reason it won't extract them. It raises the exception stating that the file doesn't exist. – Zac Brown Dec 03 '10 at 02:07
folders don't exist in zip files as the file entries names are path names i.e `some/path/to/some/file` would be the name of a file and there are no entries for the directories – Dan D. Dec 03 '10 at 02:10
I got it. I used the extract method in the zipfile module... along with some use of the OS module. Thanks. – Zac Brown Dec 03 '10 at 02:17
I'm not quite understanding your example above. Where exactly is the extraction taking place? – Zac Brown Dec 03 '10 at 02:18
@ZacBrown b = i.read(block_size) # extract from the zip – Amnon Sep 12 '22 at 19:24

casper.dcl · Answer 3 · 2020-12-30T22:34:58.113

Sorry a bit late seeing this. Had a similar problem, needing an equivalent to zipfile.Zipfile.extractall. If you have tqdm>=4.40.0 (which I released over a year ago), then:

from os import fspath
from pathlib import Path
from shutil import copyfileobj
from zipfile import ZipFile
from tqdm.auto import tqdm  # could use from tqdm.gui import tqdm
from tqdm.utils import CallbackIOWrapper

def extractall(fzip, dest, desc="Extracting"):
    """zipfile.Zipfile(fzip).extractall(dest) with progress"""
    dest = Path(dest).expanduser()
    with ZipFile(fzip) as zipf, tqdm(
        desc=desc, unit="B", unit_scale=True, unit_divisor=1024,
        total=sum(getattr(i, "file_size", 0) for i in zipf.infolist()),
    ) as pbar:
        for i in zipf.infolist():
            if not getattr(i, "file_size", 0):  # directory
                zipf.extract(i, fspath(dest))
            else:
                with zipf.open(i) as fi, open(fspath(dest / i.filename), "wb") as fo:
                    copyfileobj(CallbackIOWrapper(pbar.update, fi), fo)

Amnon · Answer 4 · 2022-09-14T10:36:13.807

For the lazy, below is a self-contained working example based on Dan D's answer. Tested on Python 3.10.6. Not optimized, but works.

In this example, the assumption is that the target "test" directory exists, but you can of course create it in the extract function.

The advantage of Dan's answer over most of the answers I've seen for this topic is that showing progress each time a file from the archive is processed does not achieve the goal if the archive consists of very large files.

import zipfile
import os
from pathlib import Path

def extract(zip_path, target_path):
    block_size = 8192
    z = zipfile.ZipFile(zip_path)
    for entry_name in z.namelist():
        entry_info = z.getinfo(entry_name)
        i = z.open(entry_name)
        print(entry_name)
        if entry_name[-1] != '/':
            dir_name = os.path.dirname(entry_name)
            p = Path(f"{target_path}/{dir_name}")
            p.mkdir(parents=True, exist_ok=True)
            o = open(f"{target_path}/{entry_name}", 'wb')
            offset = 0
            while True:
                b = i.read(block_size)
                offset += len(b)
                print(float(offset)/float(entry_info.file_size) * 100.)
                if b == b'':
                    break
                o.write(b)
            o.close()
        i.close()
    z.close()

extract("test.zip", "test")

Mohamed Omar · Answer 5 · 2023-01-28T20:42:30.503

0

import zipfile
srcZipFile = 'srcZipFile.zip'
distZipFile = 'distZipFile'
with zipfile.ZipFile(srcZipFile) as zf:
    filesList = zf.namelist()
    for idx, file in enumerate(filesList):
        percent = round((idx / len(filesList))*100)
        print(percent)
        zf.extract(file, distZipFile)
    zf.close()

edited Jan 28 '23 at 20:42

answered Jan 28 '23 at 20:03

Mohamed Omar

1
1

Extract ZipFile using Python, display Progress Percentage?

5 Answers5

Linked