5

Could you please help me write a function returning:

dict("file1.txt": list(<contents of file1>),
     "file2.txt": list(<contents of file2>),
     "file3.txt": list(<contents of file3>),
     "file4.txt": list(<contents of file4>))

On input:

    file.zip:
        outer\
        outer\inner1.zip:
                file1.txt
                file2.txt
        outer\inner2.zip:
                file3.txt
                file4.txt

My attempts (with exceptions below):

user1438003
  • 6,603
  • 8
  • 30
  • 36
  • @MikePennington I've attempted to write and debug the code myself. See the 3 differently formulated code links in my question. – user1438003 Jun 05 '12 at 18:07
  • Thanks @wroniasty, it wouldn't let me add more than 2 links (this is my first question afterall!) – user1438003 Jun 05 '12 at 18:16

2 Answers2

5

Finally worked it out... with a bit of help from: Extracting a zipfile to memory?;

from zipfile import ZipFile, is_zipfile

def extract_zip(input_zip):
    input_zip=ZipFile(input_zip)
    return {name: input_zip.read(name) for name in input_zip.namelist()}

def extract_all(input_zip): 
    return {entry: extract_zip(entry) for entry in ZipFile(input_zip).namelist() if is_zipfile(entry)}
Community
  • 1
  • 1
user1438003
  • 6,603
  • 8
  • 30
  • 36
1

Modified your code (You should close ZipFile before deleting it + added extraction of inner zip files):

import os
import shutil
import tempfile
from zipfile import ZipFile

def unzip_recursively(parent_archive):
    parent_archive = ZipFile(parent_archive)
    result = {}
    tmpdir = tempfile.mkdtemp()
    try:
        parent_archive.extractall(path=tmpdir)
        namelist=parent_archive.namelist()
        for name in namelist[1:]:
            innerzippath = os.path.join(tmpdir, name)
            inner_zip = ZipFile(innerzippath)
            inner_extract_path = innerzippath+'.content'
            if not os.path.exists(inner_extract_path):
                os.makedirs(inner_extract_path)
            inner_zip.extractall(path=inner_extract_path)

            for inner_file_name in inner_zip.namelist():
                result[inner_file_name] = open(os.path.join(inner_extract_path, inner_file_name)).read()
            inner_zip.close()
    finally:
        shutil.rmtree(tmpdir)
    return result

if __name__ == '__main__':
    print unzip_recursively('file.zip')
Arseniy
  • 1,737
  • 1
  • 19
  • 35
  • Thanks, but that code is much more complicated than [mine](http://stackoverflow.com/a/10910305/1438003)... is there a way to cleanup the zip by adding just one more function to mine? - Actually I think the memory might automatically get cleaned by Python garbage collectors... correct me if I'm wrong. – user1438003 Jun 06 '12 at 08:52