4

I am trying to zip multiple folders using shutil.make_archive using threads in python. I see that the smaller folder zips completely and at the same time the other thread also stops zipping.

So, is shutil.make_archive thread safe?

xjcl
  • 12,848
  • 6
  • 67
  • 89
Praveen S
  • 104
  • 11

2 Answers2

4

shutil.make_archive() is not thread-safe.

The reason for it is that it changes the current working directory, which is global for the process. Threads don't have their own working directory. See the relevant code in Python 2.7:

save_cwd = os.getcwd()
if root_dir is not None:
    if logger is not None:
        logger.debug("changing into '%s'", root_dir)
    base_name = os.path.abspath(base_name)
    if not dry_run:
        os.chdir(root_dir)

if base_dir is None:
    base_dir = os.curdir
...

The function saves the current working directory at the beginning of its execution and restores it before returning, but this is not enough for thread-safety.

KovBal
  • 2,129
  • 2
  • 17
  • 31
2

Here is my thread-safe alternative to shutil.make_archive based on this answer:

import zipfile
import os

def make_archive_threadsafe(zip_name: str, path: str):
    with zipfile.ZipFile(zip_name, 'w', zipfile.ZIP_DEFLATED) as zip:
        for root, dirs, files in os.walk(path):
            for file in files:
                zip.write(os.path.join(root, file), os.path.relpath(os.path.join(root, file), path))

make_archive_threadsafe('/root/backup.zip', '/root/some_data/')

Note that make_archive also uses ZipFile internally so this should be solid.

This does not include the folder-to-zip into the zip ("single top-level folder") unlike the answer I linked -- personal preference.

Code is Python 3, but works in Python 2 if you remove type annotations.

xjcl
  • 12,848
  • 6
  • 67
  • 89