2

I need help calculating the distance between two points-- in this case, the two points are longitude and latitude. I have a .txt file that contains longitude and latitude in columns like this:

-116.148000 32.585000
-116.154000 32.587000
-116.159000 32.584000

The columns do not have headers. I have many more latitudes and longitudes.

So far, i have come up with this code:

from math import sin, cos, sqrt, atan2, radians
R = 6370
lat1 = radians()  #insert value
lon1 = radians()
lat2 = radians()
lon2 = radians()

dlon = lon2 - lon1
dlat = lat2- lat1

a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1-a))
distance = R * c
print (distance)

Many of the answers/code i've seen on stack overflow for calculating the distance between longitude and latitude have had longitude and latitude assigned as specific values.

I would like the longitude and latitude to equal the values in the columns they are in and for the equation to go through all of the longitudes and latitudes and calculate the distance.

I have not been able to come up with something to do this. Any help would be appreciated

  • 4
    I think that this question covers many topics (from reading a file to calculating a formula): you should focus your question just to the central topic so that we can help you more easily. Then, you should also give verbose names to variables, so that readers can understand. – Don Jul 31 '19 at 15:25
  • What should the output be? I sounds like the most reasonable output would be a distance matrix or square dataframe? – johnchase Jul 31 '19 at 15:33
  • @johnchase A square dataframe –  Jul 31 '19 at 15:40

3 Answers3

4

Based on the question it sounds like you would like to calculate the distance between all pairs of points. Scipy has built in functionality to do this.

My suggestion is to first write a function that calcuates distance. Or use an exisitng one like the one in geopy mentioned in another answer.

def get_distance(point1, point2):
    R = 6370
    lat1 = radians(point1[0])  #insert value
    lon1 = radians(point1[1])
    lat2 = radians(point2[0])
    lon2 = radians(point2[1])

    dlon = lon2 - lon1
    dlat = lat2- lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    distance = R * c
    return distance

Then you can pass this function into scipy.spatial.distance.cdist

all_points = df[[latitude_column, longitude_column]].values

dm = scipy.spatial.distance.cdist(all_points,
                                  all_points,
                                  get_distance)

As a bonus you can convert the distance matrix to a data frame if you wish to add the index to each point:

pd.DataFrame(dm, index=df.index, columns=df.index)

NOTE: I realized I am assuming, possibly incorrectly that you are using pandas

johnchase
  • 13,155
  • 6
  • 38
  • 64
0

If you don't mind importing a few libraries, this can be done very simply.

With pandas, you can read your text file into a dataframe, which makes working with tabular data like this super easy.

import pandas as pd
df = pd.read_csv('YOURFILENAME.txt', delimiter=' ', header=None, names=('latitude', 'longitude'))

Then you can use the geopy library to calculate the distance.

maltodextrin
  • 152
  • 4
0

Another solution is using the haversine equation with numpy to read in the data and calculate the distances. Any of the previous answers will also work using the other libraries.

import numpy as np

#read in the file, check the data structure using data.shape()
data = np.genfromtxt(fname) 

#Create the function of the haversine equation with numpy
def haversine(Olat,Olon, Dlat,Dlon):

    radius = 6371.  # km

    d_lat = np.radians(Dlat - Olat)
    d_lon = np.radians(Dlon - Olon)
    a = (np.sin(d_lat / 2.) * np.sin(d_lat / 2.) +
         np.cos(np.radians(Olat)) * np.cos(np.radians(Dlat)) *
         np.sin(d_lon / 2.) * np.sin(d_lon / 2.))
    c = 2. * np.arctan2(np.sqrt(a), np.sqrt(1. - a))
    d = radius * c
    return d

#Call function with your data from some specified point
haversine(10,-110,data[:,1],data[:,0])

In this case, you can pass a single number for Olat,Olon and an array for Dlat,Dlon or vice versa and it will return an array of distances.

For example:

haversine(20,-110,np.arange(0,50,5),-120)

OUTPUT

array([2476.17141062, 1988.11393057, 1544.75756103, 1196.89168113,
   1044.73497113, 1167.50120561, 1499.09922502, 1934.97816445,
   2419.40097936, 2928.35437829])
BenT
  • 3,172
  • 3
  • 18
  • 38
  • My answer only assumes pandas for loading data into a dataframe. None of the functionality is based on pandas. You would still need to load data for this answer – johnchase Jul 31 '19 at 21:00
  • Good point. Wasn't sure OP needed help with reading in data – BenT Jul 31 '19 at 21:10