I am using KNeighborsRegressor, but I would like to use it with custom distance function. My training set is pandas DataFrame which looks like:
week_day hour minute temp humidity
0 1 9 0 1
1 1 9 0 1
2 1 9 0 1
3 1 9 0 1
4 1 9 1 1
...
def customDistance(a, b):
print a, b
return np.sum((a-b)**2)
dt = DistanceMetric.get_metric("pyfunc", func=customDistance)
knn_regression = KNeighborsRegressor(n_neighbors=15, metric='pyfunc', metric_params={"func": customDistance})
knn_regression.fit(trainSetFeatures, trainSetResults)
I have also tried calling customDistance directly from KNeighborsRegressor constructor like:
knn_regression = KNeighborsRegressor(n_neighbors=15, metric=customDistance)
Both ways function gets executed but results are kinda weird. First of all, I would expect to see as function input A and B rows from my DataFrame but instead of that I get:
[0.87716989 11.46944914 1.00018801 1.10616031 1.] [ 1. 9. 0. 1. 1.]
Second attribute B is clearly row from my training set, but I can not clarify where did first row come from? If someone could explain or post the example of right insertion of a custom distance function into mentioned algorithm it would be highly appreciated.
Thanks in advance.
Best regards, Klemen