This seems to be more related to file system traversing than zipfile. For that, you can use [Python 3]: glob - Unix style pathname pattern expansion, and for handling .zip files handling use [Python 3]: zipfile - Work with ZIP archives.
For more details on traversing directories, check [SO]: How do I list all files of a directory? (@CristiFati's answer).
code.py:
#!/usr/bin/env python3
import sys
import os
import glob
import zipfile
INPUT_DIR = ".\\InDir"
OUTPUT_DIR = ".\\OutDir"
def get_zip_files(path, start_pattern): # Python 3.5 + !!!
return glob.iglob(os.path.join(INPUT_DIR, os.path.join("**", start_pattern + "*.zip")), recursive=True)
def main():
for item in get_zip_files(INPUT_DIR, "PUBLIC_"):
print("Found .zip file that matches pattern: {:s}".format(item))
zf = zipfile.ZipFile(item)
for name in zf.namelist():
if name.lower().endswith(".csv"):
print(" Extracting {:s}".format(name))
zf.extract(name, path=OUTPUT_DIR)
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
Notes:
- I created (in my cwd) a tree structure that is simpler than yours, but thew principle is the same
- The files used are dummy
- The algorithm is simple:
- Search the input dir for .zip files that match desired pattern (name starts with PUBLIC_)
- For each such file, extract all .csv files that it contains in the output dir
Output:
e:\Work\Dev\StackOverflow\q054498244>dir /b
code.py
InDir
OutDir
e:\Work\Dev\StackOverflow\q054498244>dir /b /s InDir
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir01
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00\OTHER_FILE.zip
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir00\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_00.zip
e:\Work\Dev\StackOverflow\q054498244\InDir\Dir0\Dir01\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_01.zip
e:\Work\Dev\StackOverflow\q054498244>dir /b OutDir
e:\Work\Dev\StackOverflow\q054498244>"e:\Work\Dev\VEnvs\py_064_03.06.08_test0\Scripts\python.exe" code.py
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
Found .zip file that matches pattern: .\InDir\Dir0\Dir00\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_00.zip
Extracting PUBLIC_DVD_DISPATCH_UNIT_SCDATA_0.csv
Found .zip file that matches pattern: .\InDir\Dir0\Dir01\PUBLIC_DVD_DISPATCH_UNIT_SCDATA_01.zip
Extracting PUBLIC_DVD_DISPATCH_UNIT_SCDATA_1.csv
e:\Work\Dev\StackOverflow\q054498244>dir /b OutDir
PUBLIC_DVD_DISPATCH_UNIT_SCDATA_0.csv
PUBLIC_DVD_DISPATCH_UNIT_SCDATA_1.csv
@EDIT0:
For Python 2 compatibility, simply replace get_zip_files the with the version below:
def get_zip_files(path, start_pattern):
start_pattern_lower = start_pattern.lower()
entries = os.listdir(path)
for entry in entries:
entry_lower = entry.lower()
entry_with_path = os.path.join(path, entry)
if os.path.isdir(entry_with_path):
for sub_entry in get_zip_files(entry_with_path, start_pattern):
yield sub_entry
else:
if entry_lower.startswith(start_pattern_lower) and entry_lower.endswith(".zip"):
yield entry_with_path