18

I have a large object in my Python3 code which, when tried to be pickled with the pickle module throws the following error:

TypeError: cannot serialize '_io.BufferedReader' object

However, dill.dump() and dill.load() are able to save and restore the object seamlessly.

  1. What causes the trouble for the pickle module?
  2. Now that dill saves and reconstructs the object without any error, is there any way to verify if the pickling and unpickling with dill went well?
  3. How's it possible that pickle fails, but dill succeeds?
sherlock
  • 2,397
  • 3
  • 27
  • 44
  • 2
    TL;DR: `pickle` doesn't handle functions or complex objects as well as `dill`. I use `dill` for all my Data Science pickling since the models and objects are very deep and complex – JacobIRR Oct 01 '19 at 22:37
  • Dill is also built on top of pickle but like above, it's made for complex objects where pickle can't succeed. – MyNameIsCaleb Oct 01 '19 at 23:31

1 Answers1

56

I'm the dill author.

1) Easiest thing to do is look at this file: https://github.com/uqfoundation/dill/blob/master/dill/_objects.py, it lists what pickle can serialize, and what dill can serialize.

2) you can try dill.copy and dill.check and dill.pickles to check different levels of pickling and unpickling. dill also more includes utilities for detecting and diagnosing serialization issues in dill.detect and dill.pointers.

3) dill is built on pickle, and augments it by registering new serialization functions.

4) dill includes serialization variants which enable the user to choose from different object dependency serialization strategies (in dill.settings) -- including source code extraction and object reconstitution with dill.source (and extension of the stdlib inspect module).

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139