8

I can't figure out how to interpret the outputs of the haversine implementations in sklearn (version 20.2)

The documentation says,"Note that the haversine distance metric requires data in the form of [latitude, longitude] and both inputs and outputs are in units of radians.",so I should be able to convert to km multiplying by 6371 (great distance approx for radius).

A functioning distance calculation from two points would be as follows:

def distance(origin, destination):
    lat1, lon1 = origin
    lat2, lon2 = destination
    radius = 6371 # km

    dlat = math.radians(lat2-lat1)
    dlon = math.radians(lon2-lon1)
    a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
        * math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    d = radius * c

    return d

distance([32.027240,-81.093190],[41.981876,-87.969982])
1263.103504537151

This is the correct distance.

Using the BallTree implementation:

from sklearn.neighbors import BallTree
test_points = [[32.027240,41.981876],[-81.093190,-87.969982]]
tree = BallTree(test_points,metric = 'haversine')
results = tree.query_radius(test_points,r = 10,return_distance  = True)

results[1]
array([array([0.        , 1.53274271]), array([1.53274271, 0.        ])],
      dtype=object)

Same for the distanceMetric implementation:

dist = DistanceMetric.get_metric('haversine')
dist.pairwise([[32.027240,41.981876],[-81.093190,-87.969982]])
array([[0.        , 1.53274271],
       [1.53274271, 0.        ]])

I also tried changing the order, in case it wasn't supposed to be input as [[lat1,lat2],[lon1,lon2]] and also didn't get results that I can interpret.

Does anyone know how I can get the distance in km from two coordinates using the sklearn implementations?

flyingmeatball
  • 7,457
  • 7
  • 44
  • 62

2 Answers2

11

So the issue is that sklearn requires everything to be in radians, but the latitude/longitude and radius I have were in degrees/meters respectively. Before using, I needed to do some conversions:

from sklearn.neighbors import BallTree
earth_radius = 6371000 # meters in earth
test_radius = 10 # meters

test_points = [[32.027240,41.981876],[-81.093190,-87.969982]]
test_points_rad = [[x[0] * np.pi / 180, x[1] * np.pi / 180] for x in test_points ]

tree = BallTree(test_points_rad, metric = 'haversine')
results = tree.query_radius(test_points, r=test_radius/earth_radius, return_distance  = True)
flyingmeatball
  • 7,457
  • 7
  • 44
  • 62
  • Just to clarify: the end result is the distance in meters, because you defined your earth_radius in meters, right? – labourday Sep 27 '20 at 21:12
  • Correct - radians are technically just an angle, so however you define the distance from the center of the earth to the outside is what the measurement will be in – flyingmeatball Sep 29 '20 at 16:13
  • The input to the query function should be in rads as well. And the returned distances are supposed to be multiplied by earth radius to get them in metres. – Shashwat Jan 07 '21 at 09:54
  • So I think you just need to multiply distances using the code below at the end: distances, indices = tree.query_radius(test_points, r=radius/earth_radius, return_distance=True) an then, FinlDist=distances * earth_radius – user3665906 Nov 02 '22 at 17:39
5

Just to clarify a previous answer by @flyingmeatball, a few things:

  1. Maybe due to changes in sklearn: you need to specify the coordinates row-wise
  2. A simple way to convert to radians is just to import radians module from maths
  3. The results that you get at the end needs to be multiplied by earth's radius again to get the answer in meters/kilometers.

Please see the code sample below...

from math import radians
earth_radius = 6371000 # meters in earth
test_radius = 1300000 # meters

test_points = [[32.027240,-81.093190],[41.981876,-87.969982]]
test_points_rad = np.array([[radians(x[0]), radians(x[1])] for x in test_points ])

tree = BallTree(test_points_rad, metric = 'haversine')
ind,results = tree.query_radius(test_points_rad, r=test_radius/earth_radius, 
return_distance  = True)
print(ind)
print(results * earth_radius/1000)