7

I am using KNeighborsRegressor, but I would like to use it with custom distance function. My training set is pandas DataFrame which looks like:

week_day  hour  minute  temp  humidity
0         1     9       0     1      
1         1     9       0     1      
2         1     9       0     1      
3         1     9       0     1      
4         1     9       1     1     
  ...

def customDistance(a, b):
    print a, b
    return np.sum((a-b)**2)

dt = DistanceMetric.get_metric("pyfunc", func=customDistance)

knn_regression = KNeighborsRegressor(n_neighbors=15, metric='pyfunc', metric_params={"func": customDistance})
knn_regression.fit(trainSetFeatures, trainSetResults)

I have also tried calling customDistance directly from KNeighborsRegressor constructor like:

knn_regression = KNeighborsRegressor(n_neighbors=15, metric=customDistance)

Both ways function gets executed but results are kinda weird. First of all, I would expect to see as function input A and B rows from my DataFrame but instead of that I get:

[0.87716989 11.46944914 1.00018801 1.10616031 1.] [ 1. 9. 0. 1. 1.]

Second attribute B is clearly row from my training set, but I can not clarify where did first row come from? If someone could explain or post the example of right insertion of a custom distance function into mentioned algorithm it would be highly appreciated.

Thanks in advance.

Best regards, Klemen

0 Answers0