1

I want to apply k nearest neighbour with a custom distance function. I have not found a way to pass this function using packages like FNN or class. Is there a way to pass a function or distance matrix to an existing knn algorithm in some R package or do I have to write it from scratch?

Background

To elaborate on my problem: my data includes columns for

  • start latitude
  • start longitude
  • start country
  • end latitude
  • end longitude
  • end country
  • start+end country
  • means of transportation
  • distance
  • price

and I want to estimate the price based on the other factors. The distance function needs to include the haversine distance to measure the similarity of start and end points' latitude and longitude, so I cannot use a built-in distance like euclidean or minkowski.

Open for Python suggestions

If somebody believes that for some reason this would be much easier to do in Python (provided the same programming skills in both languages) using some fancy package, I am also very open to additional information about this.

mondano
  • 827
  • 10
  • 29
  • Did you check this answer? http://stackoverflow.com/a/32869977/3871924 – agenis Oct 07 '16 at 15:31
  • Yes, but as far as I understand, then I'd have to do almost the whole knn myself. I would like to let a package deal with normalization, cross validation and so on, just pass another distance function. FastKNN does not do this, as far as I can see. – mondano Oct 08 '16 at 14:34

1 Answers1

3

After searching a bit, I found a package called KODAMA that does cross validation 10 fold for instance and seems to have a knn prediction function knn.predict working with a distance matrix calculated separately by the knn.dist function.

It appears that the output of the knn.dist function is nothing but a standard distance matrix with symmetric values and diagonal set to zero, of class Matrix. So we can create one separately, those lines of code are equivalent:

kdist <- KODAMA::knn.dist(x)
kdist <- dist(x, upper=T, diag=T) %>% as.matrix # it also works
knn.predict(train, test, y ,kdist, k=3, agg.meth="majority")

You might try it with your custom distance matrix. hope it helps.

agenis
  • 8,069
  • 5
  • 53
  • 102