0

I am new to using scipy. I can't quite seem to figure out the metric of distance_upper_bound in scipy.spatial.KDTree.query. Is it in Kilometers or Radians?

PRIBAN91
  • 87
  • 3
  • 12

1 Answers1

2

It's exactly the value in terms of the chosen metric on which you decided using parameter p.

If you did chose p=1, aka manhattan-distance, the distance of the vectors: x=[1,2,3] and y=[2,3,4] is 3. If you would have used distance_upper_bound=2 and y is the next neighbor to x you are looking for, don't expect a correct result.

Remark: this parameter you are talking about is set to inf by default.

Your task seems to be about latitude/longitude points. In this case, i think you want to use the Haversine-metric (disclaimer: i'm no expert in this area).

Sadly, this metric is imho not available in terms of a p-norm, the only ones supported in scipy's neighbor-searches!

But: sklearn's BallTree can work with Haversine! (KDTree does not! Still p-norms!)

There is probably a good reason (either math or practical performance) why KDTree is not supporting Haversine, while BallTree does. Don't try to roll your own Haversine-KDTree blindly!

From here:

A binary search tree cannot handle the wraparound of the polar representation by design. You might need to transform the coordinates to a 3D cartesian space and then apply your favorite search algorithm, e.g., kD-Tree, Octree etc.

from sklearn.neighbors import KDTree, BallTree

KDTree.valid_metrics
# ['euclidean', 'l2', 'minkowski', 'p', 'manhattan', 'cityblock', 'l1', 'chebyshev',
#  'infinity']

BallTree.valid_metrics
# ['euclidean', 'l2', 'minkowski', 'p', 'manhattan', 'cityblock', 'l1', 'chebyshev',
#  'infinity', 'seuclidean', 'mahalanobis', 'wminkowski', 'hamming', 'canberra',
#  'braycurtis', 'matching', 'jaccard', 'dice', 'kulsinski', 'rogerstanimoto', 'russellrao',
#  'sokalmichener', 'sokalsneath', 'haversine', 'pyfunc']
sascha
  • 32,238
  • 6
  • 68
  • 110
  • Let's say I have latitudes and longitudes as input. What should be the value of p then? Also, I thought the method will return all the nearest neighbors which is <= distance_upper_bound. Isn't that the case? – PRIBAN91 Oct 25 '17 at 14:21
  • That's a question not much suited here i suppose as more math-based. I'm not sure what people do in this research-area. But yes: given some metric, you can ask for all neighbors with less distance in regards to this metric. Important: *metric*! MAybe **haversine**. – sascha Oct 25 '17 at 14:24
  • Thanks a lot! that saved me some troubles. I will implement my own Kd Tree. – PRIBAN91 Oct 25 '17 at 14:36
  • Sure. I read it. It's better to implement my own, as it would help in the adhoc parameter controls. – PRIBAN91 Oct 25 '17 at 14:39
  • Why? What controls? sklearn's NeighborSearches are preeeetty good. – sascha Oct 25 '17 at 14:39
  • I will give it a try. Thanks. – PRIBAN91 Oct 25 '17 at 14:41
  • And think about the alternative mentioned in my link. But i can't help with this. You may have more knowledge there (mapping). – sascha Oct 25 '17 at 14:42