1

Someone posted a similar question here but I couldn't get my job done

see

Sklearn kNN usage with a user defined metric

I want to define my user_metric and use it in KNN.
I have a signature problem it seems but I don't understand it. thanks

gamma=2


def mydist2 (x,y):
    z=(x-y)
    return (z[0]^2+gamma*z[1]^2) 
neigh = KNeighborsClassifier(n_neighbors=3,metric=mydist2)

neigh.fit(traindata,train_labels)
neigh.score(testdata,test_labels)

def mydist2 (x,y):ValueError Traceback (most recent call last) <ipython-input-81-f934c7b5c9b3> in <module>()
→ 1 neigh.fit(traindata,train_labels)
   2 neigh.score(testdata,test_labels)

C:\Users\Fagui\Anaconda2\lib\site-packages\sklearn\neighbors\base.pyc
in fit(self, X, y)
801 self._y = self._y.ravel()
802
803 return self._fit(X)
804
805

C:\Users\Fagui\Anaconda2\lib\site-packages\sklearn\neighbors\base.pyc
in fit(self, X)
256 self.tree = BallTree(X, self.leaf_size,
257 metric=self.effective_metric
,
--> 258 **self.effective_metric_params
)
259 elif self._fit_method == 'kd_tree':
260 self._tree = KDTree(X, self.leaf_size,

    sklearn/neighbors/binary_tree.pxi in sklearn.neighbors.ball_tree.BinaryTree.init (sklearn\neighbors\ball_tree.c:8381)()

    sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.DistanceMetric.get_metric
(sklearn\neighbors\dist_metrics.c:4032)()

    sklearn/neighbors/dist_metrics.pyx in sklearn.neighbors.dist_metrics.PyFuncDistance.init
(sklearn\neighbors\dist_metrics.c:10628)()

    ValueError: func must be a callable taking two arrays

as a bonus question, I'd like to pass gamma as an argument

thanks very much

greybeard
  • 2,249
  • 8
  • 30
  • 66
Fagui Curtain
  • 1,867
  • 2
  • 19
  • 34

3 Answers3

2

From KNeighborsClassifier documentation : the metric argument must be a string or DistanceMetric Object and you gave a function.

In order to pass your own metric you have to specify : metric='pyfunc' and add the keyword argument func=mydist2.

In the similar question : they explain that a custom metric can only be used when algorithm='ball_tree'is set and you kept the default which is 'auto'.

I think that the following should work:

neigh = KNeighborsClassifier(n_neighbors=3, algorithm='ball_tree',metric='pyfunc', func=mydist2)

When it comes to pass gamma as an argument I would try :

def mydist2 (x,y, gamma=2):
    z=(x-y)
    return (z[0]^2+gamma*z[1]^2) 

and add the argument metric_params={'gamma':2}

neigh = KNeighborsClassifier(n_neighbors=3, algorithm='ball_tree',metric='pyfunc', func=mydist2, metric_params={'gamma':2} )

But I'm not sure, there are no clear example in the doc.

Community
  • 1
  • 1
arthur
  • 2,319
  • 1
  • 17
  • 24
2

my question was very stupid

the syntax was correct

the problem is that exponentiation in python is not with ^ but with **

hence 16=2**4 instead of 2^4

Fagui Curtain
  • 1,867
  • 2
  • 19
  • 34
-1

Define a metric in Cython, build the module to create the library and call it from your main code.

Sklearn is optimized and use cython and several process to run as fast as possible. Writing pure python code especially when it is called several times will slow your code. I recommend that you write your custom metric using cython. You have a tutorial that you can follow right here

  • 1
    Welcome to Stack Overflow! A link to a potential solution is always welcome, but please [add context around the link](//meta.stackoverflow.com/a/8259) so your fellow users will have some idea what it is and why it’s there. **Always quote the most relevant part of an important link, in case the target site is unreachable or goes permanently offline.** Take into account that being _barely more than a link to an external site_ is a possible reason as to [Why and how are some answers deleted?](//stackoverflow.com/help/deleted-answers). – Machavity Jul 05 '17 at 15:20