4

I have a class MyClass defined in my_module. MyClass has a method pickle_myself which pickles the instance of the class in question:

def pickle_myself(self, pkl_file_path):
    with open(pkl_file_path, 'w+') as f:
        pkl.dump(self, f, protocol=2)

I have made sure that my_module is in PYTHONPATH. In the interpreter, executing __import__('my_module') works fine:

>>> __import__('my_module')
<module 'my_module' from 'A:\my_stuff\my_module.pyc'>

However, when eventually loading the file, I get:

File "A:\Anaconda\lib\pickle.py", line 1128, in find_class
  __import__(module)
ImportError: No module named my_module

Some things I have made sure of:


EDIT -- A toy example that reproduces the error:

The example itself is spread over a bunch of files.

First, we have the module ball (stored in a file called ball.py):

class Ball():
    def __init__(self, ball_radius):
        self.ball_radius = ball_radius

    def say_hello(self):
        print "Hi, I'm a ball with radius {}!".format(self.ball_radius)

Then, we have the module test_environment:

import os
import ball
#import dill as pkl
import pickle as pkl

class Environment():
    def __init__(self, store_dir, num_balls, default_ball_radius):
        self.store_dir = store_dir
        self.balls_in_environment = [ball.Ball(default_ball_radius) for x in range(num_balls)]

    def persist(self):
        pkl_file_path = os.path.join(self.store_dir, "test_stored_env.pkl")

        with open(pkl_file_path, 'w+') as f:
            pkl.dump(self, f, protocol=2)

Then, we have a module that has functions to make environments, persist them, and load them, called make_persist_load:

import os
import test_environment
#import pickle as pkl
import dill as pkl


def make_env_and_persist():
    cwd = os.getcwd()

    my_env = test_environment.Environment(cwd, 5, 5)

    my_env.persist()

def load_env(store_path):
    stored_env = None

    with open(store_path, 'rb') as pkl_f:
        stored_env = pkl.load(pkl_f)

    return stored_env

Then we have a script to put it all together, in test_serialization.py:

import os
import make_persist_load

MAKE_AND_PERSIST = True
LOAD = (not MAKE_AND_PERSIST)

cwd = os.getcwd()
store_path = os.path.join(cwd, "test_stored_env.pkl")

if MAKE_AND_PERSIST == True:
    make_persist_load.make_env_and_persist()

if LOAD == True:
    loaded_env = make_persist_load.load_env(store_path)

In order to make it easy to use this toy example, I have put it all up on in a Github repository that simply needs to be cloned into your directory of choice.. Please see the README containing instructions, which I also reproduce here:

Instructions:

1) Clone repository into a directory.

2) Add repository directory to PYTHONPATH.

3) Open up test_serialization.py, and set the variable MAKE_AND_PERSIST to True. Run the script in an interpreter.

4) Close the previous interpreter instance, and start up a new one. In test_serialization.py, change MAKE_AND_PERSIST to False, and this will programmatically set LOAD to True. Run the script in an interpreter, causing ImportError: No module named test_environment.

5) By default, the test is set to use dill, instead of pickle. In order to change this, go into test_environment.py and make_persist_load.py, to change imports as required.


EDIT: after switching to dill '0.2.5.dev0', dill.detect.trace(True) output

C2: test_environment.Environment
# C2
D2: <dict object at 0x000000000A9BDAE8>
C2: ball.Ball
# C2
D2: <dict object at 0x000000000AA25048>
# D2
D2: <dict object at 0x000000000AA25268>
# D2
D2: <dict object at 0x000000000A9BD598>
# D2
D2: <dict object at 0x000000000A9BD9D8>
# D2
D2: <dict object at 0x000000000A9B0BF8>
# D2
# D2

EDIT: the toy example works perfectly well when run on Mac/Ubuntu (i.e. Unix-like systems?). It only fails on Windows.

Community
  • 1
  • 1
bzm3r
  • 3,113
  • 6
  • 34
  • 67
  • I'm guessing that `__import__('my_module')` works from your current directory. Can you test with absolute path to see if you run into the same problem? – code_dredd Nov 28 '15 at 07:05
  • How are you "eventually loading the file"? – martineau Nov 28 '15 at 07:58
  • I cloned, followed the steps and it worked for me. I don't know what _Run the script in an interpreter_ means so I just did `python testing_serialization.py`. – tdelaney Nov 28 '15 at 17:32
  • I also followed the steps 2-5, using the files as posted above. It works for me (I used `import dill as pkl` in both files). So… what version of `dill` do you have? (I'm using the master, from github)… what version of python do you have? (I'm using 2.7)… what OS do you have? (I'm on MacOSX)… Are you running from '.'? (I always am)… Are you deleting everything between runs (`.pyc`, `.pkl`, …)? Are you changing directories between the `dump` run and the `load`? – Mike McKerns Nov 28 '15 at 18:46
  • @MikeMcKerns **(1)** `dill version: dill.info.this_version: 0.2.4`, **(2)** Python version: `Python 2.7.10`, 3) OS: Windows 10 Pro, **(3)** I am not sure what running from '.' means -- do you mean from that `test_serialization.py` is in the same directory as the modules, `ball`, `test_environment`, etc.? In that case, yes, I am running from '.' **(4)** I was not deleting everything between runs, but I just now tried it and it made no difference, **(5)** I am not changing the directory where `test_serialization.py` is stored in between runs (not sure if interpreting question correctly) – bzm3r Nov 28 '15 at 19:58
  • @MikeMcKerns I used `pip` to install `dill`, since I am not sure if I know how to install it if I got it from master on github... – bzm3r Nov 28 '15 at 19:58
  • Ok, it's probably one of two things: (1) there have been a few `classmethod` improvements to the trunk of `dill` since the release of `0.2.4`, and installing from `github` will give get you all the updates. However, I expect it's (2) you are on windows -- I don't regularly test on windows, and I've not tried your code on windows. To update to the trunk/master version on github, do this: `pip install --user git+https://github.com/uqfoundation/dill` or `pip install git+https://github.com/uqfoundation/dill.git@master`. If that fails, I blame windows. – Mike McKerns Nov 28 '15 at 22:24
  • And, if if still fails, try setting `dill.detect.trace(True)`, and post the traceback. You can also try tweaking the serialization settings, with `dill.settings`, for example: `dill.settings['byref'] = True`. – Mike McKerns Nov 28 '15 at 22:25
  • @MikeMcKerns I now have `dill 0.2.5.dev0`, but the issue still exists. I updated the question with the output when I set `dill.detect.trace(True)`. `dill.settings['byref'] = True` did not help either. I am next going to try it on my Ubuntu system. – bzm3r Nov 28 '15 at 23:15
  • Where is the error from `dill.detect.trace(True)`? If that's the full trace, it looks like there's no error. – Mike McKerns Nov 29 '15 at 00:06
  • @MikeMcKerns There is only trace output when dumping (which seems fine, I agree). When loading, I have set up `dill.detect.trace(True)`, but it doesn't print out anything? I can paste the compiler trace, as that's all I see in my interpreter. (I have updated the Github repo, so you can see where I have placed the `dill.detect.trace(True)` sets. – bzm3r Nov 29 '15 at 00:24
  • @MikeMcKerns Had to reset my Ubuntu installation because I hadn't used it in a while. Now it's in order, and I tried out the script. Works perfectly okay. I suppose you should blame Windows? I added the `windows` tag to the question too. – bzm3r Nov 29 '15 at 01:25
  • Since it's a more natural workflow for resolving an issue, would you mind submitting this as an issue on the `dill` github page? You can post the entire traceback there (or here). Your success on Ubuntu at least tells me that I need to see the traceback and/or kick the tires on this thing on windows. (Groan) – Mike McKerns Nov 29 '15 at 10:38
  • @MikeMcKerns see: https://github.com/uqfoundation/dill/issues/140 – bzm3r Nov 29 '15 at 18:14

2 Answers2

5

I can tell from your question that you are probably doing something like this, with a class method that is attempting to pickle the instance of the class. It's ill-advised to do that, if you are doing that… it's much more sane to use pkl.dump external to the class instead (where pkl is pickle or dill etc). However, it can still work with this design, see below:

>>> class Thing(object):
...   def pickle_myself(self, pkl_file_path):
...     with open(pkl_file_path, 'w+') as f:
...       pkl.dump(self, f, protocol=2)
... 
>>> import dill as pkl
>>> 
>>> t = Thing()
>>> t.pickle_myself('foo.pkl')

Then restarting...

Python 2.7.10 (default, Sep  2 2015, 17:36:25) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> f = open('foo.pkl', 'r')
>>> t = dill.load(f)
>>> t
<__main__.Thing object at 0x1060ff410>

If you have a much more complicated class, which I'm sure you do, then you are likely to run into trouble, especially if that class uses another file that is sitting in the same directory.

>>> import dill
>>> from bar import Zap
>>> print dill.source.getsource(Zap)
class Zap(object):
    x = 1
    def __init__(self, y):
        self.y = y

>>> 
>>> class Thing2(Zap):   
...   def pickle_myself(self, pkl_file_path):
...     with open(pkl_file_path, 'w+') as f:
...       dill.dump(self, f, protocol=2)
... 
>>> t = Thing2(2)
>>> t.pickle_myself('foo2.pkl')

Then restarting…

Python 2.7.10 (default, Sep  2 2015, 17:36:25) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> f = open('foo2.pkl', 'r')
>>> t = dill.load(f)
>>> t
<__main__.Thing2 object at 0x10eca8090>
>>> t.y
2
>>> 

Well… shoot, that works too. You'll have to post your code, so we can see what pattern you are using that dill (and pickle) fails for. I know having one module import another that is not "installed" (i.e. in some local directory) and expecting the serialization to "just work" doesn't for all cases.

See dill issues: https://github.com/uqfoundation/dill/issues/128 https://github.com/uqfoundation/dill/issues/129 and this SO question: Why dill dumps external classes by reference, no matter what? for some examples of failure and potential workarounds.

EDIT with regard to updated question:

I don't see your issue. Running from the command line, importing from the interpreter (import test_serialization), and running the script in the interpreter (as below, and indicated in your steps 3-5) all work. That leads me to think you might be using an older version of dill?

>>> import os
>>> import make_persist_load
>>> 
>>> MAKE_AND_PERSIST = False #True
>>> LOAD = (not MAKE_AND_PERSIST)
>>> 
>>> cwd = os.getcwd()
>>> store_path = os.path.join(cwd, "test_stored_env.pkl")
>>> 
>>> if MAKE_AND_PERSIST == True:
...     make_persist_load.make_env_and_persist()
... 
>>> if LOAD == True:
...     loaded_env = make_persist_load.load_env(store_path)
... 
>>> 

EDIT based on discussion in comments:

Looks like it's probably an issue with Windows, as that seems to be the only OS the error appears.

EDIT after some work (see: https://github.com/uqfoundation/dill/issues/140):

Using this minimal example, I can reproduce the same error on Windows, while on MacOSX it still works…

# test.py
class Environment():
    def __init__(self):
        pass

and

# doit.py
import test
import dill

env = test.Environment()
path = "test.pkl"
with open(path, 'w+') as f:
    dill.dump(env, f)

with open(path, 'rb') as _f:
    _env = dill.load(_f)
    print _env

However, if you use open(path, 'r') as _f, it works on both Windows and MacOSX. So it looks like the __import__ on Windows is more sensitive to file type than on non-Windows systems. Still, throwing an ImportError is weird… but this one small change should make it work.

Community
  • 1
  • 1
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Hi Mike -- pleased to hear from you! I set up a toy example that reproduces the error (see edit in my question), and it's all stored up in a [github repo with instructions in the `README`](https://github.com/bmer/serialization_issue/tree/master). Please let me know if anything needs clarification. On my computer, running the test as instructed causes `ImportError: No module named test_environment`, whether I am using `pickle` or `dill`. Finally, thanks for `dill`! – bzm3r Nov 28 '15 at 16:54
  • thanks so much for looking into this for me :); I love `dill` and basically use it for everything I can now – bzm3r Dec 11 '15 at 04:53
3

In case someone is having same problem, I had the same problem running Python 2.7 and the problem was the pickle file created on windows while I am running Linux, what I had to do is running dos2unix which has to be downloaded first using

sudo yum install dos2unix

And then you need to convert the pickle file example

dos2unix data.p
MMSA
  • 810
  • 8
  • 22