6

For some reason I cannot get cPickle.load to work on the file-type object returned by ZipFile.open(). If I call read() on the file-type object returned by ZipFile.open() I can use cPickle.loads though.

Example ....

import zipfile
import cPickle

# the data we want to store
some_data = {1: 'one', 2: 'two', 3: 'three'}

#
# create a zipped pickle file
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
zf.writestr('data.pkl', cPickle.dumps(some_data))
zf.close()

#
# cPickle.loads works
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd1 = cPickle.loads(zf.open('data.pkl').read())
zf.close()

#
# cPickle.load doesn't work
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd2 = cPickle.load(zf.open('data.pkl'))
zf.close()

Note: I don't want to zip just the pickle file but many files of other types. This is just an example.

Dharman
  • 30,962
  • 25
  • 85
  • 135
eric.frederich
  • 1,598
  • 4
  • 17
  • 30
  • have you tried `import picklefork`? :P – FrustratedWithFormsDesigner Jun 09 '10 at 14:27
  • 3
    Try telling us what "doesn't work" means in this case; we don't have crystal balls. – John Machin Jun 09 '10 at 14:37
  • @John, how hard is it to copy and past the code he's given? It gets an `EOFError` in the last snippet. – Alex Martelli Jun 09 '10 at 14:39
  • @Alex: effort of ONE person copy/pasting the traceback etc much less than effort of MULTIPLE people copy/pasting the code into a GUESSED version of Python – John Machin Jun 09 '10 at 14:44
  • 1
    No traceback here (as `cPickle` is C-coded), and the `.open` method was introduced in Python 2.6 -- so, where's the guess? (if the OP was using 2.7 - still just a release candidate - or the still-rare Py3 I **would** definitely expect a mention or tag;-). Not defending the general idea of posting incomplete info, but this question (with short, complete, stand-alone code) is **way** above the SO average, so singling it out for criticism seems quite inappropriate to me -- sure it could be better (mention 2.6.5 or whatever specific version, OS used, EOFError, ...), but only marginally. – Alex Martelli Jun 09 '10 at 18:10

1 Answers1

8

It's due to an imperfection in the pseudofile object implemented by the zipfile module (for the .open method of the ZipFile class introduced in Python 2.6). Consider:

>>> f = zf.open('data.pkl')
>>> f.read(1)
'('
>>> f.readline()
'dp1\n'
>>> f.read(1)
''
>>> 

the sequence of .read(1) -- .readline() is what .loads internally does (on a protocol-0 pickle, the default in Python 2, which is what you're using here). Unfortunately zipfile's imperfection means this particular sequence doesn't work, producing a spurious "end of file" (.read returning an empty string) right after the first read/readline pair.

Not sure offhand if this bug in Python's standard library is fixed in Python 2.7 -- I'm going to check.

Edit: just checked -- the bug is fixed in Python 2.7 rc1 (the release candidate that's currently the latest 2.7 version). I don't yet know whether it's fixed in the latest bug-fix release of 2.6 as well.

Edit again: the bug is still there in Python 2.6.5, the latest bug-fix release of Python 2.6 -- so if you can't upgrade to 2.7 and need better-behaving pseudofile objects from ZipFile.open, a backport of the 2.7 fix seems the only viable solution.

Note that it's not certain you do need better-behaving pseudofile objects; if you control the dump calls and can use the latest-and-greatest protocol, everything will be fine:

>>> zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
>>> zf.writestr('data.pkl', cPickle.dumps(some_data, -1))
>>> sd2 = cPickle.load(zf.open('data.pkl'))
>>> 

it's only old crufty backwards-compatible "protocol 0" (the default) that requires proper pseudofile object behavior when mixing read and readline calls in the load (protocol 0 is also slower, and results in larger pickles, so it's definitely not recommended unless backwards compatibility with old Python versions, or the ascii-only nature of the pickles that 0 produces, are mandatory constraints in your application).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395