4

I'm working on making some cython objects pickeable and have a question about using __setstate_ vs __reduce__. It seems that when you pickle.loads() an object with a __setstate__ method and also a __cinit__ method, the __cinit__ DOES get called (unlike if it were an __init__). Is there a way to prevent this or pass a default argument or should I just use __reduce__?

Here's a toy problem to illustrate (code modified from this blog).

in test.pyx I have three classes:

cdef class Person:
    cdef public str name
    cdef public int age

    def __init__(self,name,age):
        print('in Person.__init__')
        self.name = name 
        self.age = age 

    def __getstate__(self):
        return (self.name, self.age,)

    def __setstate__(self, state):
        name, age = state
        self.name = name
        self.age = age

cdef class Person2:
    cdef public str name
    cdef public int age

    def __cinit__(self,name,age):
        print('in Person2.__cinit__')
        self.name = name 
        self.age = age 

    def __getstate__(self):
        return (self.name, self.age,)

    def __setstate__(self, state):
        name, age = state
        self.name = name
        self.age = age

cdef class Person3:
    cdef public str name
    cdef public int age

    def __cinit__(self,name,age):
        print('in Person3.__cinit__')
        self.name = name 
        self.age = age 

    def __reduce__(self):
        return (newPerson3,(self.name, self.age))

def newPerson3(name,age):
    return Person3(name,age)

After building with python setup.py build_ext --inplace, pickling Person works as expected (because __init__ does not get called):

import test 
import pickle 

p = test.Person('timmy',12)
p_l = pickle.loads(pickle.dumps(p))

Pickling Person2 fails:

p2 = test.Person2('timmy',12)
p_l = pickle.loads(pickle.dumps(p2))

with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "test.pyx", line 25, in test.Person2.__cinit__
    print('in Person2.__cinit__')
TypeError: __cinit__() takes exactly 2 positional arguments (0 given)

So __cinit__ gets called....

The __reduce__ method in Person3 works as expected:

p3 = test.Person3('timmy',12)
p_l = pickle.loads(pickle.dumps(p3))

So is there a way to use __setstate__ to pickle Person2?

In my actual problem, the classes are more complex and using __setstate__ would be more straightforward, but maybe I have to use __reduce__ here? I'm new to cython and custom pickling (and also don't know C well......), so may be missing something obvious...

chris
  • 1,267
  • 7
  • 20

2 Answers2

4

In a nutshell: Use __getnewargs_ex__ or __getnewargs__ to provide the needed arguments to __cinit__-method.

How does it work? When a Python object is created, this is a two step process:

  • First, __new__ is used to create an uninitialized object
  • In second step, __init__ is used to initialize the object created in the first step

pickle uses a slightly different algorithm:

  • __new__ is used to create an uninitialized object
  • __setstate__ (and no longer __init__) is used to initialize the object created in the first step.

That makes sense: __init__ has nothing to do with the "current" state of the object. We don't know parameters for __init__ and even if __init__ would have no parameters, it could possible do unnecessary work.

Where comes the __cinit__ into the play? When __cinit__ is defined, Cython defines automatically a __new__-method (that is the reason it is impossible to define a __new__-method manually in a cdef-calls), which calls the provided __cinit__-method before returning. In Person2-example, this functions looks as follows:

static PyObject *__pyx_tp_new_4test_Person2(PyTypeObject *t, PyObject *a, PyObject *k) {
  struct __pyx_obj_4test_Person2 *p;
  PyObject *o;
  if (likely((t->tp_flags & Py_TPFLAGS_IS_ABSTRACT) == 0)) {
    o = (*t->tp_alloc)(t, 0);
  } else {
    o = (PyObject *) PyBaseObject_Type.tp_new(t, __pyx_empty_tuple, 0);
  }
  if (unlikely(!o)) return 0;
  p = ((struct __pyx_obj_4test_Person2 *)o);
  p->name = ((PyObject*)Py_None); Py_INCREF(Py_None);
  if (unlikely(__pyx_pw_4test_7Person2_1__cinit__(o, a, k) < 0)) goto bad;
  return o;
  bad:
  Py_DECREF(o); o = 0;
  return NULL;
}

if (unlikely(__pyx_pw_4test_7Person2_1__cinit__(o, a, k) < 0)) goto bad; is the line where __cinit__ is called.

With the above it becomes clear, why __cinit__ gets called by pickle, and we cannot prevent that, as __new__ must be called anyway.

pickle however provides futher hooks to get the needed information for __cinit__-method to the __new__-method: __getnewargs_ex__ and __getnewargs__.

Your Person2 class could look as follows:

%%cython
cdef class Person2:
    cdef public str name
    cdef public int age
    
    def __cinit__(self, name, age):
        self.name=name
        self.age=age

    def __getnewargs_ex__(self):
        return (self.name, self.age),{}

    def __getstate__(self):
        return ()
    
    def __setstate__(self, state):
        pass

and now

p2 = test.Person2('timmy',12)
p_l = pickle.loads(pickle.dumps(p2))

does succeed!

This is a toy example, which doesn't make much sense and thus:

  • __getstate__ and __setstate__ are here just dummies, because all needed information is provided by __cinit__, in general this is not the case.
  • in this example __cinit__ doesn't make much sense, it would make more sense to have __init__ instead.

Often one uses __cinit__ rather then __init__ for cdef-classes. However in general it is not 100% correct and when pickling is involved, it is important to decide what is happening in __cinit__ and what is happening in __init__.

The other extrem, i.e. to put the whole initialization code into the __init__-method, is tempting to solve issues with pickling. However, the combination __new__+__init__ isn't atomic, it is possible that __new__ is called and then the object is used before (or instead like pickling does,) __init__-method was invoked, which might lead to NULL-pointer-dereferencing and other crashes.

One also must be aware, that while __cinit__ is executed exactly once (when __new__ is executed), __init__ can be executed multiple times (for example __new__ can be overwritten a subclass in such a way that it always returns the same singleton), that means:

cdef class A:
    cdef char *a
    def __cinit__(self):
       a=<char*> malloc(1)

is ok, while the same code in __init__:

cdef class A:
    cdef char *a
    def __init__(self):
       a=<char*> malloc(1)

is a possible memory-leak, as a could be an initialized pointer and not NULL, which is guaranteed only for __cinit__.

ead
  • 32,758
  • 6
  • 90
  • 153
2

The purpose of __cinit__ is that it always gets run.

Your __cinit__() method is guaranteed to be called exactly once.

In contrast __init__ can be uncalled (for example in your case, or in inherited classes) or called multiple times.

The value of __cinit__ always being called is that many classes with C types must be setup in a certain way or they're automatically invalid - for example they might expect to pointer to be initialized to some memory that the class holds. (This is the sort of invalid that leads to an automatic crash to desktop, rather than "Python" invalid which should just lead to Python exceptions).

Since your toy example only holds Python objects you'd probably be better just using __init__ - there's no reason it must be initialized. You can have both __init__ and __cinit__ if you want, to separate the parts that must happen from the parts that just happen in a normal initialization.

If you do chose to use __cinit__ but want to call it from a variety of contexts with different numbers of arguments the documentation recommends

you may find it useful to give the __cinit__() method * and ** arguments so that it can accept and ignore extra arguments.


In summary, use __cinit__ to do the initialization that has to happen for your program not to crash. Accept that it'll be called from __setstate__ (since you don't want your program to crash after you use __setstate__, right?). Combine it with __init__ to do the initialization that should should generally happen but isn't required and that may sometimes be overridden. Use default arguments or * and ** arguments to make __cinit__ flexible enough for your needs.

DavidW
  • 29,336
  • 6
  • 55
  • 86