1

How can I pickle a dictionary object that contains instances of a Class in one file (Python File 1) and pickle.load in another file (Python File 2)?

I have a HUGE complicated dataset made up of several files and I created a class to store all of my attributes. I made a dictionary to store all of the samples and attributes . key = sample, value = instance of the class containing the atributes. Example below:

#Python File 1
import random

class Storage:
    def __init__(self,label,x,y): 
        self.label = label; self.x = x; self.y = y
    def get_x(self): return(self.x)
    def get_y(self): return(self.y)

D_var_instance = {}
L = ["A","B","C"]

for var in L: 
    D_var_instance[var] = Storage(label=var,x=random.random(),y=random.random())

print(D_var_instance["A"])
#<__main__.Storage instance at 0x102811128>

print(D_var_instance["A"].get_x())
#0.193517721574

It takes me a long time to make this with my real dataset, I tried using pickle and pickle.dump the dictionary object but it's not working:

#Python File 1
import pickle
pickle.dump(D_var_instance,open("/path/to/dump.txt","w"))
pickle.dump(Storage, open("/path/to/storagedump.txt","w"))

I tried loading in another Python file with this code:

#Python File 2
import pickle
Storage = pickle.load(open("/path/to/storagedump.txt","r"))
D_var_instance = pickle.load(open("/path/to/dump.txt","r"))

Got this error:

AttributeError: 'module' object has no attribute 'Storage'
O.rka
  • 29,847
  • 68
  • 194
  • 309
  • I hope I don't need to pickle each instance of a class...that would suck because I have thousands of them. – O.rka Oct 08 '15 at 02:02
  • 1
    did you try defining the same `Storage` class in the file where you tried loading the pickle? Also, you should open the file for reading/writing pickle using binary mode (`b`) . – Anand S Kumar Oct 08 '15 at 02:04
  • I just copy pasted your code and I didn't get your error. I received this as output: `{'A': <__main__.Storage instance at 0x10a128998>, 'C': <__main__.Storage instance at 0x10a1289e0>, 'B': <__main__.Storage instance at 0x10a128a28>}` – idjaw Oct 08 '15 at 02:05
  • @AnandSKumar defining the `Storage` class and the `D_var_instance` dictionary in the same `pickle.dump`? | @idjaw Were they in separate files? I can't get it to work. I changed it to "wb" and "rb" for binary. – O.rka Oct 08 '15 at 02:09
  • 1
    @O.rka I used the exact same filepath. I modeled it based on your example. Let me know what you want me to try. I have it set up here to run. – idjaw Oct 08 '15 at 02:11
  • @idjaw I opened up "storage.txt" and the contents are pretty minimal: `c__main__ Storage p0 . ` That's all it says in the file. The `dump.txt` file of the dictionary seems to have everything. – O.rka Oct 08 '15 at 02:13
  • @O.rka no, just the `Storage` class in the file in which you try to read/load the pickle – Anand S Kumar Oct 08 '15 at 02:21
  • This is where I am confused. I would expect storagedump to just have the class, because that is exactly what you are dumping to your pickle file. The dump.txt has the instance information as per what is being written to that file. Under both scenarios I was able to successfully load the file and printout its contents and use the data. Is there something I'm not understanding? – idjaw Oct 08 '15 at 02:21
  • @idjaw the first 4 codeblocks (4th containing the error) is what is happening on mine: http://stefaanlippens.net/pickleproblem . I tried it in iPython with my original dataset, now I'm using my IDE. Could that be why? – O.rka Oct 08 '15 at 02:28
  • Just ran it in iPython as well and it worked. I ran it from my PyCharm IDE and it worked. This code is working fine on my end. – idjaw Oct 08 '15 at 02:32
  • Can you copy the storagedump.txt contents in the comments? Maybe I can debug it from there. – O.rka Oct 08 '15 at 02:32
  • 1
    http://pastebin.com/ExTdLhrU – idjaw Oct 08 '15 at 02:35
  • @idjaw the same as mine... I even moved the files all into the same directory. I'm not sure what it could be but that link I posted above seems to be dealing with the same problem. I'll do some recon and let you know if I find a solution. Thanks for helping anyways...so weird it's not working on my computer. I think it's weird how the `storagedump.txt` is so small – O.rka Oct 08 '15 at 02:39
  • the line of code where you are writing to storagedump is simply dumping the class...nothing from the instance of the class. The file that has all the instance information are the lines of code you provided that reference dump.txt. Here is the pastebin of that: http://pastebin.com/ZbeU0d02 – idjaw Oct 08 '15 at 02:41
  • @idjaw Are you trying 2 different Python files? I have one Python file where I processed the data and dumped it . A second Python file where I am trying to retrieve the information. It works when I run it on the same Python (dumping and loading in same file) but not when they are in separate files. – O.rka Oct 08 '15 at 02:45
  • 1
    I just figured it out. Writing an answer for you. – idjaw Oct 08 '15 at 02:52

2 Answers2

3

You can make it easy on yourself by using dill instead of pickle. dill pickles class definitions along with class instances (instead of by reference, like pickle does). So, you don't need to do anything different other than import dill as pickle.

To simulate working in another file, I'll build a class, some class instances in a dict, then delete everything but the pickled string. You can reconstitute from there.

>>> class Foo(object):
...   def __init__(self, x):
...     self.x = x
... 
>>> d = dict(f=Foo(1), g=Foo(2), h=Foo(3))
>>> 
>>> import dill
>>> _stored_ = dill.dumps(d)
>>>        
>>> del Foo
>>> del d
>>> 
>>> d = dill.loads(_stored_)
>>> d['f'].x
1
>>> d['g'].x
2
>>> d['h'].x
3
>>> dill.dump_session()

I finish with a dump_session, to pickle everything in the interpreter to a file. Then, in a new python session (potentially on a different machine), you can start up where you left off.

>>> import dill
>>> dill.load_session()
>>> d
{'h': <__main__.Foo object at 0x110c6cfd0>, 'g': <__main__.Foo object at 0x10fbce410>, 'f': <__main__.Foo object at 0x110c6b050>}
>>> 

If you are looking for the traditional dump and load, that works too. It also works with ipython.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Thanks for the help but I had to mark the other answer since he has been working with me on this for over an hour. I will definitely look into dill. This looks like something incredibly useful for me. – O.rka Oct 08 '15 at 03:28
  • 2
    I look at it this way: why struggle for an hour or more re-coding, when a mere import can fix your issue? But, to each his own. I'm the `dill` author, so if you try `dill` and run into any difficulty, please post an issue. – Mike McKerns Oct 08 '15 at 11:27
  • @MikeMcKerns I didn't know about dill until recently. Great work. Just wanted to let you know! Cheers. – idjaw Feb 25 '16 at 14:32
1

The problem here can be perfectly explained via this SO post right here

Ultimately, what is happening here is that when you are pickling your instances, you have to be able to reference your module appropriately with respect to where you pickled it from.

So, to show some code to illustrate this. You can do this (explanation to follow):

storage.py

class Storage(object):
    pass

foo.py

import pickle
from storage import Storage

D_var_instance = {}
L = ["A","B","C"]

for var in L: 
    D_var_instance[var] = Storage(label=var,x=random.random(),y=random.random())

pickle.dump(D_var_instance, open("/path/pickle.txt", "wb"))

boo.py

D_var_instance = pickle.load(open("/path/pickle.txt", "rb"))

So, when you wrote your pickle, from foo, your reference will be storage.Storage now. When you go in to an entirely different module (boo.py) and try to unpickle, what is happening here is that you are trying to load something with reference to a module that won't work from where you are doing it from.

The way to solve this can be done in different ways now. Since I structured everything in the same level, you actually don't need to import anything and it should work!

However, if you happen to have your class and pickle writing in the same module, like you did, then you will have to import the module that houses that code in boo.py

I suggest you look at the two options provided in the SO post I linked to see which one satisfies you. But that should be your solution.

Running this script from iPython yields:

ipython boo.py
{'A': <storage.Storage instance at 0x1107b77e8>, 'C': <storage.Storage instance at 0x1107b7680>, 'B': <storage.Storage instance at 0x1107b7908>}
Community
  • 1
  • 1
idjaw
  • 25,487
  • 7
  • 64
  • 83
  • Thanks for taking a look at it! Can you do this inside iPython notebooks as well? If foo.py is actually a iPyhon notebook. – O.rka Oct 08 '15 at 03:08
  • @O.rka take a look I updated it. We can discuss in chat further if you want. – idjaw Oct 08 '15 at 03:24