1

The following code is confusing the mess out of me. I've got a zip file which I am opening in a context manager. I'm trying to extract the contents of this zip file to a temporary directory. However, when I execute this code block, it tells me that there was an "Attempt to read ZIP archive that was already closed". I find this very strange, as the zip file in question was opened in (with?) a context manager! I've inserted several print statements for calls to methods/properties associated with the object at hand. They return successfully.

Where have I gone wrong? Why does the file believe itself closed?

Any help would be appreciated!

(Edit) Please find the traceback below.

Also, is there a better way to check if a zipfile is in fact open? Other than checking if .fp is True/False?

if config.get('settings', 'new_quarter') == "Yes":
        #This gets the latest zip file, by year and quarter
        new_statements_path = os.path.join(config.get('cleaning', 'historic_dir'), 'sql_files')
        for directory,dirnames, filenames in os.walk(new_statements_path):
            zips = [f for f in filenames if ".zip" in f]
            highest_quarter = max([z.split('Q')[1].split('.')[0] for z in zips])
            print 'Targeting this quarter for initial tables: %s' % (highest_quarter)
            for z in zips:
                if 'sql_files' in f:
                    if z.split('Q')[1].split('.')[0] == highest_quarter:
                        with zipfile.ZipFile(os.path.join(directory,z), 'r') as zip_f:
                            print zip_f.fp
                            initial_tables = tempfile.mkdtemp()
                            print 'initial tables', initial_tables, os.path.exists(initial_tables) 
                            #Ensure the file is read/write by the creator only
                            saved_umask = os.umask(0077)

                            try:
                                print zip_f.namelist()
                                print zip_f.fp
                                zip_f.printdir()

                                zip_f.extractall(path=initial_tables)
                            except:
                                print traceback.format_exc()
                                os.umask(saved_umask)
                                if os.path.exists(initial_tables) == True:
                                    shutil.rmtree(initial_tables)

Traceback:

Traceback (most recent call last):
  File "/Users/n/GitHub/s/s/s/extract/extract.py", line 60, in extract_process
    zip_f.extractall(path=initial_tables)
  File "/Users/n/anaconda/lib/python2.7/zipfile.py", line 1043, in extractall
    self.extract(zipinfo, path, pwd)
  File "/Users/n/anaconda/lib/python2.7/zipfile.py", line 1031, in extract
     return self._extract_member(member, path, pwd)
  File "/Users/n/anaconda/lib/python2.7/zipfile.py", line 1085, in _extract_member
    with self.open(member, pwd=pwd) as source, \
  File "/Users/n/anaconda/lib/python2.7/zipfile.py", line 946, in open
    "Attempt to read ZIP archive that was already closed"
RuntimeError: Attempt to read ZIP archive that was already closed

(SECOND EDIT)

Here's the (reasonably) minimal & complete version. In this case, the code runs fine. Which makes sense, there's nothing fancy going on. What's interesting is I placed the full example (the one below) immediately above the previous example (above). The code below still executes just fine, but the code above still produces the same error. The only difference however is the new_statements_path variable. In the code above, this string comes from a config file. Surely, this isn't the root of the error. But I can't see any other differences.

import traceback
import os
import zipfile
import tempfile
import shutil

new_statements_path = '/Users/n/Official/sql_files'
for directory,dirnames, filenames in os.walk(new_statements_path):
    zips = [f for f in filenames if ".zip" in f]
    highest_quarter = max([z.split('Q')[1].split('.')[0] for z in zips])
    print 'Targeting this Quarter for initial tables: %s' % (highest_quarter)
    for z in zips:
            if 'sql_files' in f:
                    if z.split('Q')[1].split('.')[0] == highest_quarter:
                            with zipfile.ZipFile(os.path.join(directory,z), 'r') as zip_f:
                                    print zip_f.fp
                                    initial_tables = tempfile.mkdtemp()
                                    print 'initial tables', initial_tables, os.path.exists(initial_tables) 
                                    #Ensure the file is read/write by the creator only
                                    saved_umask = os.umask(0077)

                                    try:
                                            print zip_f.namelist()
                                            print zip_f.fp
                                            zip_f.printdir()
                                            zip_f.extractall(path=initial_tables)
                                    except:
                                            print traceback.format_exc()
                                            os.umask(saved_umask)
                                            if os.path.exists(initial_tables) == True:
                                                    shutil.rmtree(initial_tables)

    if os.path.exists(initial_tables) == True:
            shutil.rmtree(initial_tables)
Dharman
  • 30,962
  • 25
  • 85
  • 135
  • can we have stacktrace please? – Jean-François Fabre Sep 06 '17 at 16:59
  • @Jean-FrançoisFabre Of course! How could I forget? – SolipsisticAltruist Sep 06 '17 at 17:06
  • strange. And I cannot reproduce. Can you try to cut it down to a [mcve] ? – Jean-François Fabre Sep 06 '17 at 17:16
  • @Jean-FrançoisFabre I've added the MCV example, along with some additional comments. (thanks, btw). – SolipsisticAltruist Sep 06 '17 at 20:28
  • At this point, I think only you can work up towards a fix. Here's how I would do: you have some working code. Now add crap until you get the behaviour of the non-working code. It's easier that way. Once you know what causes the problem (and if you cannot fix it), [edit] your question (with even less code) and ping me if you need help. – Jean-François Fabre Sep 06 '17 at 20:31
  • 1
    This sounds like an indentation problem to me. – user2357112 Sep 06 '17 at 20:32
  • 1
    Yup, edit view shows mixed tabs and spaces. It's an indentation problem. – user2357112 Sep 06 '17 at 20:33
  • 1
    As a minor stylistic thing to maybe reduce indents, swap `if 'sql_files' in f: ` with `if 'sql_files not in f: continue ` – Nick T Sep 06 '17 at 20:34
  • @user2357112 that makes perfect sense! BTW can you change your pseudo at some time? I always have to check your rep to know it's you :) I didn't think about it because I thought that SO text view killed the tabs. Apparently not :) – Jean-François Fabre Sep 06 '17 at 20:36
  • 1
    @SolipsisticAltruist run your program with the `-tt` option, i.e. `python -tt extract.py ...`. That will emit errors if Python finds mixed tabs/spaces. (A better solution would be to use Python 3...) – Nick T Sep 06 '17 at 20:36
  • @Jean-FrançoisFabre: The rendered view doesn't keep tab characters, but they're still visible in the edit view, or the source view under the revision history. I make a habit of checking the edit view for tabs whenever control flow doesn't seem to match the apparent structure of the code. As for my username, I could, but I like how this one goes 2, 3, 5, 7, 11. – user2357112 Sep 06 '17 at 20:49
  • You're right on the money @user2357112! And the -tt option is a great suggestion. Turns out sublime text was indenting using tabs! Why is that the default setting? Either way, thanks so much guys/gals/humans/entities! – SolipsisticAltruist Sep 06 '17 at 20:51
  • @user2357112 correct. That's why copying OP inputs when they have tab-separated inputs is a pain. I'll remember to copy inputs from the edit box, not the rendered one. – Jean-François Fabre Sep 06 '17 at 20:51
  • @user2357112 I understand you're a prime user :) now I see the pattern. – Jean-François Fabre Sep 06 '17 at 20:51
  • @ everyone: since this is a typo question, but we had a hard time (I had :)) figuring this out, I'm closing as a duplicate instead. I'll try to find the most suitable ones (there must be a few out there :)) so don't be offended by such a radical closure. After all the problem is solved – Jean-François Fabre Sep 06 '17 at 20:53

0 Answers0