40

I would like to write a simple script to iterate through all the files in a folder and unzip those that are zipped (.zip) to that same folder. For this project, I have a folder with nearly 100 zipped .las files and I'm hoping for an easy way to batch unzip them. I tried with following script

import os, zipfile

folder = 'D:/GISData/LiDAR/SomeFolder'
extension = ".zip"

for item in os.listdir(folder):
    if item.endswith(extension):
        zipfile.ZipFile.extract(item)

However, when I run the script, I get the following error:

Traceback (most recent call last):
  File "D:/GISData/Tools/MO_Tools/BatchUnzip.py", line 10, in <module>
    extract = zipfile.ZipFile.extract(item)
TypeError: unbound method extract() must be called with ZipFile instance as first argument (got str instance instead)

I am using the python 2.7.5 interpreter. I looked at the documentation for the zipfile module (https://docs.python.org/2/library/zipfile.html#module-zipfile) and I would like to understand what I'm doing incorrectly.

I guess in my mind, the process would go something like this:

  1. Get folder name
  2. Loop through folder and find zip files
  3. Extract zip files to folder

Thanks Marcus, however, when implementing the suggestion, I get another error:

Traceback (most recent call last):
  File "D:/GISData/Tools/MO_Tools/BatchUnzip.py", line 12, in <module>
    zipfile.ZipFile(item).extract()
  File "C:\Python27\ArcGIS10.2\lib\zipfile.py", line 752, in __init__
    self.fp = open(file, modeDict[mode])
IOError: [Errno 2] No such file or directory: 'JeffCity_0752.las.zip'

When I use print statements, I can see that the files are in there. For example:

for item in os.listdir(folder):
    if item.endswith(extension):
        print os.path.abspath(item)
        filename = os.path.basename(item)
        print filename

yields:

D:\GISData\Tools\MO_Tools\JeffCity_0752.las.zip
JeffCity_0752.las.zip
D:\GISData\Tools\MO_Tools\JeffCity_0753.las.zip
JeffCity_0753.las.zip

As I understand the documentation,

zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or a file-like object

It appears to me like everything is present and accounted for. I just don't understand what I'm doing wrong.

Any suggestions?

Thank You

tpdance
  • 1,273
  • 1
  • 9
  • 13

5 Answers5

74

Below is the code that worked for me:

import os, zipfile

dir_name = 'C:\\SomeDirectory'
extension = ".zip"

os.chdir(dir_name) # change directory from working dir to dir with files

for item in os.listdir(dir_name): # loop through items in dir
    if item.endswith(extension): # check for ".zip" extension
        file_name = os.path.abspath(item) # get full path of files
        zip_ref = zipfile.ZipFile(file_name) # create zipfile object
        zip_ref.extractall(dir_name) # extract file to dir
        zip_ref.close() # close file
        os.remove(file_name) # delete zipped file

Looking back at the code I had amended, the directory was getting confused with the directory of the script.

The following also works while not ruining the working directory. First remove the line

os.chdir(dir_name) # change directory from working dir to dir with files

Then assign file_name as

file_name = dir_name + "/" + item
Chenlu
  • 449
  • 1
  • 6
  • 19
tpdance
  • 1,273
  • 1
  • 9
  • 13
  • Thanks for the explanations mate!! My problem is that all the files that I extract have the same filename inside and when I use extractall it directly smashes the files leaving just the last one. I should change the name of it, but I do not know how. @Chenlu – Borja_042 Jul 05 '17 at 08:04
  • 1
    @Borja_042 I would recommend creating a count variable, then adding that to the file name on extract. Inside the loop, append the count variable to the dir name. – tpdance Jul 21 '17 at 21:38
  • 1
    What if I want unzip `zip` files in folders and subfolders? – ah bon Nov 07 '19 at 01:35
23

I think this is shorter and worked fine for me. First import the modules required:

import zipfile, os

Then, I define the working directory:

working_directory = 'my_directory'
os.chdir(working_directory)

After that you can use a combination of the os and zipfile to get where you want:

for file in os.listdir(working_directory):   # get the list of files
    if zipfile.is_zipfile(file): # if it is a zipfile, extract it
        with zipfile.ZipFile(file) as item: # treat the file as a zip
           item.extractall()  # extract it in the working directory
Bondify
  • 341
  • 2
  • 5
9

The accepted answer works great!

Just to extend the idea to unzip all the files with .zip extension within all the sub-directories inside a directory the following code seems to work well:

import os
import zipfile

for path, dir_list, file_list in os.walk(dir_path):
    for file_name in file_list:
        if file_name.endswith(".zip"):
            abs_file_path = os.path.join(path, file_name)

            # The following three lines of code are only useful if 
            # a. the zip file is to unzipped in it's parent folder and 
            # b. inside the folder of the same name as the file

            parent_path = os.path.split(abs_file_path)[0]
            output_folder_name = os.path.splitext(abs_file_path)[0]
            output_path = os.path.join(parent_path, output_folder_name)

            zip_obj = zipfile.ZipFile(abs_file_path, 'r')
            zip_obj.extractall(output_path)
            zip_obj.close()
user11015000
  • 151
  • 1
  • 15
0Nicholas
  • 389
  • 7
  • 16
4

You need to construct a ZipFile object with the filename, and then extract it:

    zipfile.ZipFile.extract(item)

is wrong.

    zipfile.ZipFile(item).extractall()

will extract all files from the zip file with the name contained in item.

I think you should more closely read the documentation to zipfile :) but you're on the right track!

Marcus Müller
  • 34,677
  • 4
  • 53
  • 94
  • 2
    The docs for `extractall` are at: https://python.readthedocs.io/en/latest/library/zipfile.html#zipfile.ZipFile.extractall – jsta Nov 12 '20 at 20:55
2

Recursive version of @tpdance answer.

Use this for for subfolders and subfolder. Working on Python 3.8

import os
import zipfile

base_dir = '/Users/john/data' # absolute path to the data folder
extension = ".zip"

os.chdir(base_dir)  # change directory from working dir to dir with files


def unpack_all_in_dir(_dir):
    for item in os.listdir(_dir):  # loop through items in dir
        abs_path = os.path.join(_dir, item)  # absolute path of dir or file
        if item.endswith(extension):  # check for ".zip" extension
            file_name = os.path.abspath(abs_path)  # get full path of file
            zip_ref = zipfile.ZipFile(file_name)  # create zipfile object
            zip_ref.extractall(_dir)  # extract file to dir
            zip_ref.close()  # close file
            os.remove(file_name)  # delete zipped file
        elif os.path.isdir(abs_path):
            unpack_all_in_dir(abs_path)  # recurse this function with inner folder


unpack_all_in_dir(base_dir)
nlavr
  • 626
  • 5
  • 7