0

I am trying to read a zip file in python that was written with pkzip:

import zipfile
fname = "myfile.zip"
unzipped = zipfile.ZipFile(fname, "r")

But get this error:

    unzipped = zipfile.ZipFile(fname, "r")
  File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 1222, in __init__
    self._RealGetContents()
  File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 1285, in _RealGetContents
    endrec = _EndRecData(fp)
  File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 282, in _EndRecData
    return _EndRecData64(fpin, -sizeEndCentDir, endrec)
  File "/home/username/anaconda3/envs/c1/lib/python3.7/zipfile.py", line 228, in _EndRecData64
    raise BadZipFile("zipfiles that span multiple disks are not supported")
zipfile.BadZipFile: zipfiles that span multiple disks are not supported

As far as I can tell, this file does not span multiple disks. I say this because:

  1. Checking against the solution in this Stackoverflow answer, my version of zipfile was appropriately patched.

  2. It unzips fine with:

    $ unzip myfile.zip
    

    on the linux command line.

So, it doesn't seem to actually be a bad zip file. Reading the first few bytes by opening it with raw file access, there is a suggestive header that PKzip may be formatting this file in an interesting way:

  b'PK\x03

Examining the python library documentation for zipfile, there is an PKZIP application note:

The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note.

Which links here. This is very thorough, but I don't see concrete instruction on how to add which options into the call to zipfile in order to parse the file correctly.

PKZIP is in fairly wide use, so I'm surprised to not find more common examples or native support. What options are necessary to open a pkzipped file in python that throws this multiple-disk error?

Mittenchops
  • 18,633
  • 33
  • 128
  • 246

1 Answers1

1

The link you posted changed zipfile from this

if diskno != 0 or disks != 1:
    raise BadZipFile("zipfiles that span multiple disks are not supported")

to this

if diskno != 0 or disks > 1:
    raise BadZipFile("zipfiles that span multiple disks are not supported")

If you are still getting the error "zipfiles that span multiple disks are not supported", it means that diskno != 0 or disks > 1.

You need to find out more about the internal structure of myfile.zip.

Try running zipdetails and checking the very last section output. Below is what a single disk archive should look like

# zipdetails  fred.zip 
...
3CF31 END CENTRAL HEADER    06054B50
3CF35 Number of this disk   0000
3CF37 Central Dir Disk no   0000
3CF39 Entries in this disk  0009
3CF3B Total Entries         0009
3CF3D Size of Central Dir   00000317
3CF41 Offset to Central Dir 0003CC1A
3CF45 Comment Length        0000
Done
pmqs
  • 3,066
  • 2
  • 13
  • 22
  • 1
    Interesting. So I looked at the code for zipfile and saw it was fixed in line with the quoted answer: https://github.com/python/cpython/blob/3.7/Lib/zipfile.py#L227 However my installation of python 3.7.3 on disk has the offending line still wrong in zipfile.py as "!=" not ">". This may be another question, but how is that even possible for the standard library? – Mittenchops Dec 12 '19 at 15:52
  • @Mittenchops - does your problem go away if you modify your installation of python 3.7.3? – pmqs Dec 12 '19 at 16:36
  • Yes it looks like it does. – Mittenchops Dec 12 '19 at 16:45