33

The pickle reference states that the set of objects which can be pickled is rather limited. Indeed, I have a function which returns a dinamically-generated class, and I found I can't pickle instances of that class:

>>> import pickle
>>> def f():
...     class A: pass
...     return A
... 
>>> LocalA = f()
>>> la = LocalA()
>>> with open('testing.pickle', 'wb') as f:
...     pickle.dump(la, f, pickle.HIGHEST_PROTOCOL)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
AttributeError: Can't pickle local object 'f.<locals>.A'

Such objects are too complicated for pickle. Ok. Now, what's magic is that, if I try to pickle a similar object, but of a derived class, it works!

>>> class DerivedA(LocalA): pass
... 
>>> da = DerivedA()
>>> with open('testing.pickle', 'wb') as f:
...     pickle.dump(da, f, pickle.HIGHEST_PROTOCOL)
...
>>>

What's happening here? If this is so easy, why doesn't pickle use this workaround to implement a dump method that allows "local objects" to be pickled?

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
fonini
  • 2,989
  • 3
  • 21
  • 39

4 Answers4

43

I think you did not read the reference you cite carefully. The reference also clearly states that only the following objects are pickleable:

  • functions defined at the top level of a module (using def, not >lambda)
  • built-in functions defined at the top level of a module
  • classes that are defined at the top level of a module

Your example

>>> def f():
...     class A: pass
...     return A

does not define a class at the top level of a module, it defines a class within the scope of f(). pickle works on global classes, not local classes. This automatically fails the pickleable test.

DerivedA is a global class, so all is well.

As for why only top-level (global to you) classes and functions can't be pickled, the reference answers that question as well (bold mine):

Note that functions (built-in and user-defined) are pickled by “fully qualified” name reference, not by value. This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.

Similarly, classes are pickled by named reference, so the same restrictions in the unpickling environment apply.

So there you have it. pickle only serialises objects by name reference, not by the raw instructions contained within the object. This is because pickle's job is to serialise object hierarchy, and nothing else.

Community
  • 1
  • 1
Akshat Mahajan
  • 9,543
  • 4
  • 35
  • 44
  • Ok, only the name of the class is pickled. I thought that the class itselft was saved (somehow), and then this would imply saving its base class, which is not pickleable. – fonini May 03 '16 at 03:21
  • By the way, if I unpickle it in an environment in which there is an unrelated class but with the same name, I will get some frankstein monster object of this unrelated class but with the attributes of the old class? I think it boils down to: pickled objects are not as self-contained as I thought they were. – fonini May 03 '16 at 03:25
  • 1
    @fonini Also from the reference: "... when class instances are pickled, their class’s code and data are not pickled along with them. Only the instance data are pickled. " So, yes, that sounds like the right behaviour. :) – Akshat Mahajan May 03 '16 at 03:27
  • I completely stepped over that. – fonini May 03 '16 at 03:40
21

I disagree, you can pickle both. You just need to use a better serializer, like dill. dill (by default) pickles classes by saving the class definition instead of pickling by reference, so it won't fail your first case. You can even use dill to get the source code, if you like.

>>> import dill as pickle
>>> def f():
...   class A: pass
...   return A
... 
>>> localA = f()
>>> la = localA()
>>> 
>>> _la = pickle.dumps(la)
>>> la_ = pickle.loads(_la)
>>>    
>>> class DerivedA(localA): pass
... 
>>> da = DerivedA()
>>> _da = pickle.dumps(da)
>>> da_ = pickle.loads(_da)
>>> 
>>> print(pickle.source.getsource(la_.__class__))
  class A: pass

>>> 
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • I bet its about having different things in focus, not necessarily better? or is dill also better in terms of speed and the size of the stored object? but thanks for referring to it! – ikamen Oct 13 '18 at 06:22
  • 1
    `dill` is not faster, and also not better in terms of size of stored object. Indeed it's worse than `pickle` in both of those regards. It's far better in terms of the ability to serialize various types of objects, which is what I meant. – Mike McKerns Oct 13 '18 at 17:05
9

You can only pickle instances of classes defined at module's top level.

However, you can pickle instances of locally-defined classes if you promote them to top level.

You must set the __ qualname__ class attribute of the local class. Then you must assign the class to a top-level variable of the same name.

def define_class(name):
    class local_class:
        pass
    local_class.__qualname__ = name
    return local_class

class_A = define_class('class_A') # picklable
class_B = define_class('class_B') # picklable
class_X = define_class('class_Y') # unpicklable, names don't match
haael
  • 972
  • 2
  • 10
  • 22
  • 1
    your solution seems to be the way to go for me. Your above example works for me, but when I try on my actual code, I keep getting "Can't pickle : it's not the same object as XX.YY". Would you have any idea what I am doing wrong? – Shailesh Appukuttan Aug 26 '20 at 17:47
  • It was my mistake.... I was creating an instance of `class_A` and then redefining this class before trying to pickle it. – Shailesh Appukuttan Aug 28 '20 at 15:44
  • This. And instead of assigning to a variable you can do `_thismodule = sys.modules[__name__]` `setattr(_thismodule, name, define_class(name))` – Permafacture Mar 14 '23 at 15:07
2

DerivedA instances are pickleable because DerivedA is available through a global variable matching its fully-qualified name, which is how pickle looks for classes when unpickling.

The problem with trying to do something like this with local classes is that there's nothing identifying which A class an instance corresponds to. If you run f twice, you get two A classes, and there's no way to tell which one should be the class of unpickled A instances from another run of the program. If you don't run f at all, you get no A classes, and then what the heck do you do about the type of unpickled instances?

user2357112
  • 260,549
  • 28
  • 431
  • 505