1

In a jupyter notebook, if I define a class, instantiate it and save the object using joblib, I can load it back just as fine:

import joblib

class Duck():
    def quack(self):
        print("Quack!")

my_duck = Duck()
joblib.dump(my_duck, "my_duck.joblib")
loaded_duck = joblib.load("my_duck.joblib")
loaded_duck.quack()

output:

Quack!

But if I try to load in a new notebook (or even a regular .py script), I will not be able to:

import joblib

loaded_duck = joblib.load("my_duck.joblib")
loaded_duck.quack()

output:

AttributeError: module '__main__' has no attribute 'Duck'

How to fix that?

J. Doe
  • 13
  • 1
  • 4
  • That is actually a pickle question: https://stackoverflow.com/questions/27732354/unable-to-load-files-using-pickle-and-multiple-modules – Woodly0 Apr 27 '23 at 14:28

2 Answers2

1

I don't know if you still have this problem as it was 1 month ago; however, in case someone else has the same problem:

You are getting that error because, in the new notebook (where you are importing the object), you are not importing the definition of the class.

Try to import the class Duck in the new notebook first:

from *script_duck import Duck

import joblib

loaded_duck = joblib.load("my_duck.joblib")
loaded_duck.quack()
Jose
  • 632
  • 1
  • 13
1

Even though the question is old and the existing answer is correct, I would like to expand the discussion since I lost myself a considerate amount of time understanding this problem.

joblib is a package that allows for object persistence similar to pickle. These objects are instances of classes, e.g. typically a dataframe is an instance of pandas.DataFrame class. If you serialize this object, i.e. joblib.dump(“my_df.joblib”), the object is stored in a binary file which is tagged by its class name(s).

If you deserialize the file in order to get back the object, i.e. joblib.load(“my_df.joblib”), Python has to search for the class definition in order to be able to instantiate it. To keep up with our dataframe example, this would correspond to pandas.DataFrame. So if in the current context (different script, different notebook, etc.) you do not have pandas installed, you will get the famous ModuleNotFoundError since Python does not know how to instantiate your dataframe.

Now you have to transfer this intuition to your custom class: If its definition is within the main module when you create it, then this same definition must be in the main module the moment you load it. In your case that is __main__.Duck which means you'd need to copy-paste your class definition into your new notebook. This is, however, not a very practical approach. So I suggest you create an additional module, e.g. a folder named utils or similar where you put in all your scripts containing custom classes. The structure could then look like this:

yourProject/
│
├── notebook.ipynb
├── utils/
|   ├── __init__.py
|   └── animals.py
|
└── my_duck.joblib

And within the animals.py:

class Duck():
  def quack(self):
    print("Quack!")

If you import the custom class now, you would use:

from utils.animals import Duck

Like so, joblib tags the object accordingly and you can just use your custom module (e.g. copy-paste the utils folder) when you load your object elsewhere. Just make sure that the relative paths stay exactly the same as Python will search in ./utils/animals.py when importing Duck.

Woodly0
  • 187
  • 2
  • 13