15

How do I pickle an instance of a frozen dataclass with __slots__? For example, the following code raises an exception in Python 3.7.0:

import pickle
from dataclasses import dataclass

@dataclass(frozen=True)
class A:
  __slots__ = ('a',)
  a: int

b = pickle.dumps(A(5))
pickle.loads(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 3, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'a'

This works if I remove either the frozen or the __slots__. Is this just a bug?

drhagen
  • 8,331
  • 8
  • 53
  • 82

3 Answers3

15

The problem comes from pickle using the __setattr__ method of the instance when setting the state of the slots.

The default __setstate__ is defined in load_build in _pickle.c line 6220.

For the items in the state dict, the instance __dict__ is updated directly:

 if (PyObject_SetItem(dict, d_key, d_value) < 0)

whereas for the items in the slotstate dict, the instance's __setattr__ is used:

if (PyObject_SetAttr(inst, d_key, d_value) < 0)

Now because the instance is frozen, __setattr__ raises FrozenInstanceError when loading.

To circumvent this, you can define your own __setstate__ method which will use object.__setattr__, and not the instance's __setattr__.

The docs give some sort of warning for this:

There is a tiny performance penalty when using frozen=True: __init__() cannot use simple assignment to initialize fields, and must use object.__setattr__().

It may also be good to define __getstate__ as the instance __dict__ is always None in your case. If you don't, the state argument of __setstate__ will be a tuple (None, {'a': 5}), the first value being the value of the instance's __dict__ and the second the slotstate dict.

import pickle
from dataclasses import dataclass

@dataclass(frozen=True)
class A:
    __slots__ = ('a',)
    a: int

    def __getstate__(self):
        return dict(
            (slot, getattr(self, slot))
            for slot in self.__slots__
            if hasattr(self, slot)
        )

    def __setstate__(self, state):
        for slot, value in state.items():
            object.__setattr__(self, slot, value) # <- use object.__setattr__


b = pickle.dumps(A(5))
pickle.loads(b)

I personally would not call it a bug as the pickling process is designed to be flexible, but there is room for a feature enhancement. A revision of the pickling protocol could fix this in future. Unless I am missing something and aside of the tiny performance penalty, using PyObject_GenericSetattr for all the slots might be a reasonable fix?

Uyghur Lives Matter
  • 18,820
  • 42
  • 108
  • 144
Jacques Gaudin
  • 15,779
  • 10
  • 54
  • 75
  • I updated the previously deleted answer. It all boils down to creating your own `__setstate__` method to avoid a call to the instance's `__setattr__`. – Jacques Gaudin Mar 23 '19 at 15:27
  • 1
    do you know why it works if the class doesn't have slots? does it not use setattr to initialize in that case? – Arne Mar 23 '19 at 22:29
  • @Arne Good point, I just had a look in the source code of `pickle` and it seems that the other attributes are dealt with by modifying `__dict__` directly whereas the slots use `__setattr__`. I'll take a deeper look and update. – Jacques Gaudin Mar 23 '19 at 22:50
  • 2
    I have [reported](https://bugs.python.org/issue36424) this issue and recommended your workaround to the Python bug tracker. Hopefully, it will make it into the standard library. – drhagen Mar 25 '19 at 15:22
  • @drhagen Great, I'll try to compile with `PyObject_GenericSetattr` when I'll get a chance. Fingers crossed! – Jacques Gaudin Mar 25 '19 at 16:40
  • Just for anyone looking for a little more detail, see https://www.python.org/dev/peps/pep-0307/ – Holger Hoefling Apr 19 '20 at 12:51
  • Just to note: the same issue appears to arise when using Python's multiprocessing library. – Damien Jan 25 '21 at 20:26
7

Starting in Python 3.10.0, this works but only if you specify the slots via slots=True in the dataclass decorator. It does not work, and probably will never work, with __slots__ manually specified.

import pickle
from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class A:
  a: int

b = pickle.dumps(A(5))
pickle.loads(b)  # A(a=5)
drhagen
  • 8,331
  • 8
  • 53
  • 82
0

If you only need the class to be hashable, you can force generate __hash__ function with unsafe_hash=True option. You will not get the immutability guarantee, but immutability in python is impossible anyways.

Relevant python documentation states:

Although not recommended, you can force dataclass() to create a __hash__() method with unsafe_hash=True. This might be the case if your class is logically immutable but can nonetheless be mutated. This is a specialized use case and should be considered carefully.

import pickle
from dataclasses import dataclass

@dataclass(unsafe_hash=True)
class A:
    __slots__ = ('a',)
    a: int

b = pickle.dumps(A(5))
hash(pickle.loads(b))  # works and can hash!
김민준
  • 937
  • 11
  • 14