1

I have large (~75MB) pickled objects that are made available on mapped network drives (eg: X:/folder1/large_pickled_item.pk) The objects contain numpy arrays+python lists, and are pickled using cPickle, protocol 2

When I try to unpickle the data, I get the following error messages:

Using pickle: KeyError: (random character)

Using cPickle: IOError: [Errno 22] Invalid argument

I do not get errors if the pickled objects are smaller in size, or if I copy the (larger) objects to a local drive and run the same script.

Any idea where the problem lies? Is it a python+pickle problem or a windows shares issue?

Notes:

  1. I am using Python 2.7.2 on Windows XP Professional (SP3)
  2. I do not have control over the object format, I do not create them, I can only read them
  3. Example stack Trace:

    File "test.py", line 38, in getObject obj = pickle.load(input) File "C:\software\python\lib\pickle.py", line 1378, in load return Unpickler(file).load() File "C:\software\python\lib\pickle.py", line 858, in load dispatchkey KeyError: '~'

Solution

  1. Read the file in chunks of 67076095 bytes into a string buffer.
  2. Call pickle.loads with the string buffer instead of pickle.load with the file object
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Dhara
  • 6,587
  • 2
  • 31
  • 46
  • 2
    This is probably the same problem as http://stackoverflow.com/questions/4226941/python-ioerror-errno-22-invalid-argument-when-using-cpickle-to-write-large – NPE May 02 '12 at 09:55
  • The question you point to is indeed similar, but it does not solve my problem: the solution there is to use the w+b option to open the file for writing, a similar trick with reading does not work – Dhara May 02 '12 at 10:40

1 Answers1

1

This is due to a Windows bug, whereby reading and writing network files in chunks larger than 64MB does not work.

I suggest trying the mirror image of the workaround presented in https://stackoverflow.com/a/4228291/367273

If that doesn't help, perhaps you could create a wrapper for the file object that would automatically split every large read() into multiple smaller reads, and present that wrapper to the pickle module?

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Thanks for the answer, I already tried the mirror work-around: opening the file with r+b does not work. I can try the other trick, but I am not sure that pickle can work with only part of an object – Dhara May 02 '12 at 11:17
  • 1
    @Dhara: I mean when the read is asked to get, say, 64MB, it would get two chunks of 32MB, merge them, and return the result to the pickle module. This would sidestep the 64MB Windows limit. – NPE May 02 '12 at 11:21