3

I am getting wildly diverging distances using two approximations to calculate distance between points on Earth's surface. I am using the Haversine (vectorized) approximation and the more precise (presumably) geopy.distance.geodesic .

enter image description hereenter image description here

As you can see I am off by five percent as the distances between points becomes large. Is this divergence due to rounding error in Haversine? Do I indeed trust the Geodesic? Here is code:

import numpy as np
lat = np.linspace(35,45,100)
lon = np.linspace(-120,-110,100)

data = pd.DataFrame({'Latitude':lat,'Longitude':lon})




def Haversine(v):
    """
    distance between two lat,lon coordinates 
    using the Haversine formula. Assumes one
    radius. r = 3,950 to 3,963 mi 
    """
    from timeit import default_timer as timer
    start = timer()
    R = 3958 # radius at 40 deg 750 m elev
    v = np.radians(v)

    dlat = v[:, 0, np.newaxis] - v[:, 0]
    dlon = v[:, 1, np.newaxis] - v[:, 1]
    c = np.cos(v[:,0,None])

    a = np.sin(dlat / 2.0) ** 2 + c * c.T * np.sin(dlon / 2.0) ** 2

    c = 2 * np.arcsin(np.sqrt(a))
    result = R * c
    print(round((timer() - start),3))
    return result



def slowdistancematrix(data):

    from geopy.distance import geodesic
    distance = np.zeros((data.shape[0],data.shape[0]))
    for i in range(data.shape[0]):

        lat_lon_i = data.Latitude.iloc[i],data.Longitude.iloc[i]

        for j in range(i):

            lat_lon_j = data.Latitude.iloc[j],data.Longitude.iloc[j]

            distance[i,j] = geodesic(lat_lon_i, lat_lon_j).miles
            distance[j,i] = distance[i,j] # make use of symmetry

    return distance

distanceG = slowdistancematrix(data)
distanceH = Haversine(data.values)



plt.scatter(distanceH.ravel(),distanceG.ravel()/distanceH.ravel(),s=.5)
plt.ylabel('Geodesic/Haversine')
plt.xlabel('Haversine distance (miles)')
plt.title('all points in distance matrix')

I would rather use the vectorized version becuase it is fast. However,the 5% is too big for me to be comfortable with it. Supposedly Haversine is only suppose to be off by .5%.

UPDATE:

Found error. when implementing the vectorized version I wasn't calculating all the distances between points, but only between some. I updated code to reflect this. Here is what the difference between Haversine and Geodesic are for my domain (25-55* by -125--110):

enter image description here

Pretty darn good!

Bstampe
  • 689
  • 1
  • 6
  • 16

2 Answers2

2

The Haversine formula calculates distances between points on a sphere (the great-circle distance), as does geopy.distance.great_circle.

On the other hand, geopy.distance.geodesic calculates distances between points on an ellipsoidal model of the earth, which you can think of as a "flattened" sphere.

The difference isn't due to rounding so much as they use different formulas, with the geodesic formula more accurately modeling the true shape of the earth.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • Hey thanks. But a whole 5% off is pretty wild isnt it? Haversine is suppose to be within .5% off. The earth is fairly well approximated as a sphere. – Bstampe Oct 15 '19 at 18:00
  • 1
    Your 5% is the difference between Haversine and the geodesic, not between Haversine and the true distance. Given that an ellipsoid is a better approximation than a sphere, I would *expect* the ratio to increase with the absolute distance. – chepner Oct 15 '19 at 18:04
  • True. How can one know the true distance? I would also expect this. But if we kept going to to 3000 miles or so, we would be way off 10% off. Which is absurd. – Bstampe Oct 15 '19 at 18:04
  • It looks like its about 0.5% off before about 100 miles then it starts wildly diverging... – Bstampe Oct 15 '19 at 18:05
  • Direct measurement is needed to know the true distance, but you don't need to know that to explain the difference. – chepner Oct 15 '19 at 18:05
  • Yeah... so the best we can do is explain the difference. Hence the question. – Bstampe Oct 15 '19 at 18:07
  • That's a math question, not a programming question. – chepner Oct 15 '19 at 18:08
  • 1
    The math should only be .5% off. Hence there must be a implementation issue. – Bstampe Oct 15 '19 at 18:11
1

There was a matrix algebra error in the Haversine formula. I updated the code in the question. I am getting much better agreement between Haversine and geodesic now:

enter image description here

On my actual dataset:

enter image description here enter image description here

Bstampe
  • 689
  • 1
  • 6
  • 16