1

I am currently using some code that downloads csv data in zip files for each month in each year, the files are downloaded and then stored like this:

folders

Currently these folders are just on my desktop

Once I click on say the folder 2011 you can see a folder for each month, jan, feb etc...

So far I have tried this:

import os, zipfile

z = zipfile.ZipFile('PUBLIC_*.zip')
for f in z.namelist():
    if f.endswith('/'):
        os.makedirs(f)

but it doesn't seem to work?

Any help would be appreciated.

user8261831
  • 464
  • 1
  • 4
  • 20

2 Answers2

1

I do not have experience with the zip module unfortunately, but if you are asking how you could navigate to each of these folders I would approach the problem like such:

import os
import zipfile

main_file = 'C:\\Users\\Folder1' #wherever you have saved all this data in full path form
os.chdir(main_file) # Load program into top level
os.mkdir('OUTPUT') # make a folder to save output
try:
    for i in range(2010, 2016 + 1): # for years 2010-2016
        os.chdir(str(i))
        for j in range(1, 12+1): # months 1-12
            os.chdir('MMSDM_{0}_{1:02d}'.format(i, j))
            os.chdir('MMSDM_Historical_Data_SQLLoader/DATA')
            z = zipfile.ZipFile('PUBLIC_*.zip')
            # do stuff with zip file here
            os.chdir(main_file)
            os.chdir('OUTPUT')
            with open('FileNameUsingIorJ.csv/zip/SomeOtherExtension', 'w+') as file:
                file.write(zipfile_data)
            os.chdir(main_file) # reset for next loop
except Exception as e:
    print('Exception occurred: {}'.format(e))

I can't verify it works though because I obviously don't have the files on my PC, and there are still some fill in the blanks like "# do stuff here" but hopefully this can help get you on track! Let me know if you need more clarification.

Reedinationer
  • 5,661
  • 1
  • 12
  • 33
1

This seems to be more related to file system traversing than zipfile. For that, you can use [Python 3]: glob - Unix style pathname pattern expansion, and for handling .zip files handling use [Python 3]: zipfile - Work with ZIP archives.

For more details on traversing directories, check [SO]: How do I list all files of a directory? (@CristiFati's answer).

code.py:

#!/usr/bin/env python3

import sys
import os
import glob
import zipfile


INPUT_DIR = ".\\InDir"
OUTPUT_DIR = ".\\OutDir"


def get_zip_files(path, start_pattern):  # Python 3.5 + !!!
    return glob.iglob(os.path.join(INPUT_DIR, os.path.join("**", start_pattern + "*.zip")), recursive=True)


def main():
    for item in get_zip_files(INPUT_DIR, "PUBLIC_"):
        print("Found .zip file that matches pattern: {:s}".format(item))
        zf = zipfile.ZipFile(item)
        for name in zf.namelist():
            if name.lower().endswith(".csv"):
                print("    Extracting {:s}".format(name))
                zf.extract(name, path=OUTPUT_DIR)


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

Notes:

  • I created (in my cwd) a tree structure that is simpler than yours, but thew principle is the same
  • The files used are dummy
  • The algorithm is simple:
    • Search the input dir for .zip files that match desired pattern (name starts with PUBLIC_)
    • For each such file, extract all .csv files that it contains in the output dir

Output:

e:\Work\Dev\StackOverflow\q054498244>dir /b
code.py
InDir
OutDir

e:\Work\Dev\StackOverflow\q054498244>dir /b /s InDir
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir01
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00\OTHER_FILE.zip
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_00.zip
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir01\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_01.zip

e:\Work\Dev\StackOverflow\q054498244>dir /b OutDir

e:\Work\Dev\StackOverflow\q054498244>"e:\Work\Dev\VEnvs\py_064_03.06.08_test0\Scripts\python.exe" code.py
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32

Found .zip file that matches pattern: .\InDir\Dir0\Dir00\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_00.zip
    Extracting PUBLIC_DVD_DISPATCH_UNIT_SCDATA_0.csv
Found .zip file that matches pattern: .\InDir\Dir0\Dir01\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_01.zip
    Extracting PUBLIC_DVD_DISPATCH_UNIT_SCDATA_1.csv

e:\Work\Dev\StackOverflow\q054498244>dir /b OutDir
PUBLIC_DVD_DISPATCH_UNIT_SCDATA_0.csv
PUBLIC_DVD_DISPATCH_UNIT_SCDATA_1.csv

@EDIT0:

For Python 2 compatibility, simply replace get_zip_files the with the version below:

def get_zip_files(path, start_pattern):
    start_pattern_lower = start_pattern.lower()
    entries = os.listdir(path)
    for entry in entries:
        entry_lower = entry.lower()
        entry_with_path = os.path.join(path, entry)
        if os.path.isdir(entry_with_path):
            for sub_entry in get_zip_files(entry_with_path, start_pattern):
                yield sub_entry
        else:
            if entry_lower.startswith(start_pattern_lower) and entry_lower.endswith(".zip"):
                yield entry_with_path
CristiFati
  • 38,250
  • 9
  • 50
  • 87