5

I've built a knn model with a custom distance metric, which is the cosine distance:

def cosine_distance(x,y):
    x_module = np.sqrt(np.sum(x**2))
    y_module = np.sqrt(np.sum(y**2))
    return 1-np.dot(x,y)/(x_module*y_module)

# load data
x_feature = load_npz('data/movie_features.npz').toarray()
movies = CSVHelper.read_movie('data/IMDB_Movies_Master_data.csv')

neigh = NearestNeighbors(n_neighbors=5, metric=cosine_distance)
neigh.fit(x_feature)

# save the k-means model
joblib.dump(neigh, 'knn.pkl')

Now in a second script, I load the model using joblib:

knn_classifier = joblib.load('knn.pkl')

However, it throws the following error:

File "<stdin>", line 1, in <module>
  File "/home/A/deeplearning_env/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 578, in load
    obj = _unpickle(fobj, filename, mmap_mode)
  File "/home/A/deeplearning_env/lib/python3.5/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 508, in _unpickle
    obj = unpickler.load()
  File "/usr/lib/python3.5/pickle.py", line 1039, in load
    dispatch[key[0]](self)
  File "/usr/lib/python3.5/pickle.py", line 1334, in load_global
    klass = self.find_class(module, name)
  File "/usr/lib/python3.5/pickle.py", line 1388, in find_class
    return getattr(sys.modules[module], name)
AttributeError: module '__main__' has no attribute 'cosine_distance'

How can I tell the joblib that I'm using a custom metric? I've tried to add the function cosine_distance in the same script but it doesn't work.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
lenhhoxung
  • 2,530
  • 2
  • 30
  • 61
  • You need to have that function defined in the script where you are loading the file or import that function. You can look at alternatives here: https://stackoverflow.com/q/1253528/3374996 – Vivek Kumar Jul 16 '18 at 11:18
  • Can you share both the scripts in which you save and then load the pickled model? – Vivek Kumar Jul 16 '18 at 12:07
  • 1
    It turns out that it could be a problem of django framework. In fact, I'm building a django web app, and the code for loading the model is placed in `views.py`. I added the function `cosine_distance` to the script `views.py` but it doesn't work. However, when adding the function to script `manage.py`, it works. Is there a nicer way to fix this issue? – lenhhoxung Jul 16 '18 at 12:24

0 Answers0