37

I'm trying to create a function in my script that zips the contents of a given source directory (src) to a zip file (dst). For example, zip('/path/to/dir', '/path/to/file.zip'), where /path/to/dir is a directory, and /path/to/file.zip doesn't exist yet. I do not want to zip the directory itself, this makes all the difference in my case. I want to zip the files (and subdirs) in the directory. This is what I'm trying:

def zip(src, dst):
    zf = zipfile.ZipFile("%s.zip" % (dst), "w")
    for dirname, subdirs, files in os.walk(src):
        zf.write(dirname)
        for filename in files:
            zf.write(os.path.join(dirname, filename))
    zf.close()

This creates a zip that is essentially /. For example, if I zipped /path/to/dir, extracting the zip creates a directory with "path" in it, with "to" in that directory, etc.

Does anyone have a function that doesn't cause this problem?

I can't stress this enough, it needs to zip the files in the directory, not the directoy itself.

Dharman
  • 30,962
  • 25
  • 85
  • 135
tkbx
  • 15,602
  • 32
  • 87
  • 122
  • Possible duplicate of [How to create a zip archive of a directory](https://stackoverflow.com/questions/1855095/how-to-create-a-zip-archive-of-a-directory) – BuZZ-dEE Apr 11 '18 at 21:35

3 Answers3

57

The zipfile.write() method takes an optional arcname argument that specifies what the name of the file should be inside the zipfile.

You can use this to strip off the path to src at the beginning. Here I use os.path.abspath() to make sure that both src and the filename returned by os.walk() have a common prefix.

#!/usr/bin/env python2.7

import os
import zipfile

def zip(src, dst):
    zf = zipfile.ZipFile("%s.zip" % (dst), "w", zipfile.ZIP_DEFLATED)
    abs_src = os.path.abspath(src)
    for dirname, subdirs, files in os.walk(src):
        for filename in files:
            absname = os.path.abspath(os.path.join(dirname, filename))
            arcname = absname[len(abs_src) + 1:]
            print 'zipping %s as %s' % (os.path.join(dirname, filename),
                                        arcname)
            zf.write(absname, arcname)
    zf.close()

zip("src", "dst")

With a directory structure like this:

src
└── a
    ├── b
    │   └── bar
    └── foo

The script prints:

zipping src/a/foo as a/foo
zipping src/a/b/bar as a/b/bar

And the contents of the resulting zip file are:

Archive:  dst.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  01-28-13 11:36   a/foo
        0  01-28-13 11:36   a/b/bar
 --------                   -------
        0                   2 files
andrewdotn
  • 32,721
  • 10
  • 101
  • 130
  • Looks promising (EDIT: works perfectly), but is there any reason to import `os` and `os.path`? – tkbx Jan 28 '13 at 18:52
  • Yes—`os` for `os.walk()`, and `os.path` for `os.path.abspath()` and `os.path.join()`. – andrewdotn Jan 28 '13 at 18:55
  • @andrew you have access to `os.path` without importing it separately. The main reason to do this is if you rename it like `import os.path as P` or whatever. – agoebel Jan 28 '13 at 18:58
  • @agoebel also, while we're at it, what's the difference between `import os.path` and `from os import path` – tkbx Jan 28 '13 at 19:02
  • 2
    @tkbx: `from os import path` puts `path` at the top level, so you can do `path.join` instead of `os.path.join`. This is usually not what you want to do (especially since everyone always has a variable named `path` somewhere in their code). – abarnert Jan 28 '13 at 19:49
  • @tkbx Huh, for some reason I thought I needed to import `os.path` separately, but you’re quite right, I don’t need to. That will save at least a line on most scripts I write in the future—thanks! – andrewdotn Jan 28 '13 at 22:55
  • @abarnert that's why it's `script, vars = argv` when you `from sys import argv`. So, could you instead do `import sys.argv` and `script, vars = sys.argv`? – tkbx Jan 29 '13 at 00:44
  • 1
    @tkbx: No, you can't `import sys.argv` unless `argv` is a sub-module under `sys`. But `argv` isn't a module, it's just a `list`. But when you `import sys`—which is the normal thing you do most of the time—you then do `script, vars = sys.argv`. (Although really, you wouldn't write _that_ very often, either, because you'll get a `ValueError` if there are 0 or 2 command-line arguments.) – abarnert Jan 29 '13 at 00:54
  • @tkbx: Read [More on Modules](http://docs.python.org/2/tutorial/modules.html#more-on-modules) from the tutorial. – abarnert Jan 29 '13 at 01:07
  • 2
    This function works fine, but it will not add empty folder into the zip file, which in most case is the expected behavior. In another word, any sub-folder without a file in it will be ignored. – bobyuan Oct 21 '15 at 07:24
1

From what I can tell you are close. You could use dirname and basename to make sure you are grabbing the right path name:

>>> os.path.dirname("/path/to/dst")
'/path/to'
>>> os.path.basename("/path/to/dst")
'dst'

Then using chdir you can make sure you are in the parent so the paths are relative.

def zip(src, dst):
    parent = os.path.dirname(dst)
    folder = os.path.basename(dst)

    os.chdir(parent):
    for dirname, subdirs, filenames in os.walk(folder):
        ...

This creates:

dst/a.txt
dst/b
dst/b/c.txt
...etc...

If do not want to include the name "dst" you can just do os.chdir(dst) and then os.walk('.').

Hope that helps.

agoebel
  • 401
  • 4
  • 10
  • Please note that `zip` is also a builtin function with very different purpose: https://docs.python.org/2/library/functions.html#zip – Matteo T. Mar 26 '17 at 17:08
1

Use the arcname parameter to control the name/path in the zip file.

For example, for a zip file that contains only files, no directories:

zf.write(os.path.join(dirname, filename), arcname=filename)

Or to invent a new directory inside the zip file:

zf.write(os.path.join(dirname, filename), arcname=os.path.join("my_zip_dir", filename))
Jon-Eric
  • 16,977
  • 9
  • 65
  • 97