11

I'd like to pass object state between two Python programs (one is my own code running standalone, one is a Pyramid view), and different namespaces. Somewhat related questions are here or here, but I can't quite follow through with them for my scenario.

My own code defines a global class (i.e. __main__ namespace) of somewhat complexish structure:

# An instance of this is a colorful mess of nested lists and sets and dicts.
class MyClass :
    def __init__(self) :
        data = set()
        more = dict()
        ... 

    def do_sth(self) :
        ...

At some point I pickle an instance of this class:

c = MyClass()
# Fill c with data.

# Pickle and write the MyClass instance within the __main__ namespace.
with open("my_c.pik", "wb") as f :
    pickle.dump(c, f, -1)

A hexdump -C my_c.pik shows that the first couple of bytes contain __main__.MyClass from which I assume that the class is indeed defined in the global namespace, and that this is somehow a requirement for reading the pickle. Now I'd like to load this pickled MyClass instance from within a Pyramid view, which I assume is a different namespace:

# In Pyramid (different namespace) read the pickled MyClass instance.
with open("my_c.pik", "rb") as f :
    c = pickle.load(f)

But that results in the following error:

File ".../views.py", line 60, in view_handler_bla
  c = pickle.load(f)
AttributeError: 'module' object has no attribute 'MyClass'

It seems to me that the MyClass definition is missing in whatever namespace the view code executes? I had hoped (assumed) that pickling is a somewhat opaque process which allows me to read a blob of data into whichever place I chose. (More on Python's class names and namespaces is here.)

How can I handle this properly? (Ideally without having to import stuff across...) Can I somehow find the current namespace and inject MyClass (like this answer seems to suggest)?

Poor Solution

It seems to me that if I refrain from defining and using MyClass and instead fall back to plain built-in datatypes, this wouldn't be a problem. In fact, I could "serialize" the MyClass object into a sequence of calls that pickle the individual elements of the MyClass instance:

# 'Manual' serialization of c works, because all elements are built-in types.
pickle.dump(c.data, f, -1)
pickle.dump(c.more, f, -1)
...

This would defeat the purpose of wrapping data into classes though.

Note

Pickling takes care only of the state of a class, not of any functions defined in the scope of the class (e.g. do_sth() in the above example). That means that loading a MyClass instance into a different namespace without the proper class definition loads only the instance data; calling a missing function like do_sth() will cause an AttributeError.

Community
  • 1
  • 1
Jens
  • 8,423
  • 9
  • 58
  • 78
  • have you thought about using [named tuples](http://stackoverflow.com/questions/2970608/what-are-named-tuples-in-python)? They should be picklable, too. – User Nov 02 '14 at 11:21
  • 1
    @User: That'd be part of the "poor solution" above where I fall back to plainer types. Yes, I tried it and it works. But that's an avoidant workaround, not an answer to my question ;-) – Jens Nov 02 '14 at 11:33
  • You could simply define the `MyClass` in a module different than `__main__`... your first program should use some custom module to define that class, and then you just have to add it to the path for the second program. – Bakuriu Nov 02 '14 at 13:39
  • @Bakuriu: How would resolve the issue with the `__main__` namespace which the Pyramid code doesn't execute in? I can't import that module in the `__main__` namespace in the other program. – Jens Nov 02 '14 at 16:44
  • If you define `MyClass` in an other module, say `A`, then python will *not* look in the `__main__` namespace to load the class but it will search for the `A` module and load the class from there (since the name saved in the pickle file will be `A.MyClass` instead of `__main__.MyClass`). This completely avoids the whole problem. Defining things in `__main__` should be done only for small scripts or things that you don't plan to need somewhere else. – Bakuriu Nov 02 '14 at 18:35
  • @Bakuriu: I've just added a note to the question. It looks like I will *have* to load the proper class definition (instead of faking a symbolic excuse) to ensure that all functions are available to the instance read from the pickle. – Jens Nov 04 '14 at 00:49

3 Answers3

13

Use dill instead of pickle, because dill by default pickles by serializing the class definition and not by reference.

>>> import dill
>>> class MyClass:
...   def __init__(self): 
...     self.data = set()
...     self.more = dict()
...   def do_stuff(self):
...     return sorted(self.more)
... 
>>> c = MyClass()
>>> c.data.add(1)
>>> c.data.add(2)
>>> c.data.add(3)
>>> c.data
set([1, 2, 3])
>>> c.more['1'] = 1
>>> c.more['2'] = 2
>>> c.more['3'] = lambda x:x
>>> def more_stuff(self, x):  
...   return x+1
... 
>>> c.more_stuff = more_stuff
>>> 
>>> with open('my_c.pik', "wb") as f:
...   dill.dump(c, f)
... 
>>> 

Shut down the session, and restart in a new session…

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('my_c.pik', "rb") as f:
...   c = dill.load(f)
... 
>>> c.data
set([1, 2, 3])
>>> c.more
{'1': 1, '3': <function <lambda> at 0x10473ec80>, '2': 2}
>>> c.do_stuff()
['1', '2', '3']
>>> c.more_stuff(5)
6

Get dill here: https://github.com/uqfoundation/dill

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • And a "dilled" object (with its class) can be then read back into any namespace? – Jens Nov 03 '14 at 19:24
  • 3
    Yes. It can even be read back into the same namespace where the class definition has changed, and it still works. – Mike McKerns Nov 03 '14 at 23:02
  • I've just added a note to the original question regarding the functions of a class—they are *not* pickled; only state is. It seems that [dill](http://trac.mystic.cacr.caltech.edu/project/pathos/browser/dill/tests) does indeed include the functions defined within `class` scope? So a "dilled" class instance can then be loaded into a different namespace *and* its functions can be called (without having to include the module)? – Jens Nov 04 '14 at 00:56
  • 1
    Yes. Using `dill` to serialize a class instance will also serialize the class definition, which includes the class methods. You can even serialize an instance of a class method, and it will work similarly as well. – Mike McKerns Nov 04 '14 at 02:52
  • 1
    I added some methods both in the class def and dynamically, to demonstrate your new edits. – Mike McKerns Nov 04 '14 at 03:04
  • Oh now *that* is very neat :) Let me give this a shot! (Bummer, no [MacPorts](https://www.macports.org/) yet...) – Jens Nov 04 '14 at 06:56
  • A little off-topic: would `dill` be able to take a coredump snapshot when an exception was caught and store it away for post-mortem debugging? Could I then load that coredump into an interpreter and poke around? – Jens Nov 06 '14 at 16:23
  • `Exceptions` can be pickled by `dill`, but `Tracebacks` can't yet. A `Traceback` depends on the `Frame` object, and the `Frame` object would require a "pickle" of sorts of the GIL -- which as far as I know can't be done. I'm still working on it. – Mike McKerns Nov 07 '14 at 16:41
2

Solution 1

On pickle.load, the module __main__ needs to have a function or class called MyClass. This does not need to be the original class with the original source code. You can put other methods in it. It should work.

class MyClass(object):
    pass

with open("my_c.pik", "rb") as f :
    c = pickle.load(f)

Solution 2

Use the copyreg module which is used to register constructors and pickle functions to pickle specific objects. This is the example given by the module for a complex number:

def pickle_complex(c):
    return complex, (c.real, c.imag)

copyreg.pickle(complex, pickle_complex, complex)

Solution 3

Override the persistent_id method of the Pickler and Unpickler. pickler.persistent_id(obj) shall return an identifier that can be resolved by unpickler.persistent_id(id) to the object.

Jens
  • 8,423
  • 9
  • 58
  • 78
User
  • 14,131
  • 2
  • 40
  • 59
  • 1
    I don't have (want to touch) the Pyramid code. But your answer goes along with what [this](http://stackoverflow.com/questions/1947904/how-can-i-pickle-a-nested-class-in-python#1948057) answer suggests: inject a symbol and (meaningless) type information. In fact, when I add `setattr(sys.modules["__main__"], "MyClass", type(MyClass()))` before the `pickle.load()` call, everything seems to work. Note that `MyClass` doesn't need to be defined anywhere. Just feels rather *hackish* to me ... :-) – Jens Nov 02 '14 at 08:10
  • There is a solution to this, too. It is called `persistent_id`. This method shall be overwritten by subclasses of `Pickler` and `Unpickler`. It allows returning ids and resolving them. Alternatively you can use `copyreg` module. Have a look at it. – User Nov 02 '14 at 08:35
  • Thank you for the three suggestions. Which one is the cleanest and most pythonic one? It seems that both solutions 2 and 3 would require carrying additional functionality between the two pieces of Python code in order to ensure that `MyClass` instances can be pickled/unpickled across namespaces... – Jens Nov 02 '14 at 10:52
  • I would use copyreg. To me the problem is - what is a MyClass instance? If you really need the whole module for an instance to exist - you must share the module. Maybe you only want to have MyClass instances that have different functionality depending on the program they are in - then I would recomment copyreg as a plug-in mechanism. If you use persistent_id you would also need to subclass the Pickler class and the Unpickler class or monkey-patch it in. So I would choose copyreg. There is a fouth solution which is storing the whole information of the class into the dump (not good). – User Nov 02 '14 at 11:18
0

The easiest solution is to use cloudpickle:

https://github.com/cloudpipe/cloudpickle

It enabled me to easily send a pickled class file to another machine and unpickle it using cloudpickle again.