Minimize distance between two latitude-longitude points?

Question

I'm looking for a way to obtain a new columns reporting the minimale distance (km) under condition.

It will be more clear with an example :

Ser_Numb        LAT      LONG   VALUE   MIN
       1  74.166061 30.512811       1
       2  72.249672 33.427724       1
       3  67.499828 37.937264       0
       4  84.253715 69.328767       1
       5  72.104828 33.823462       0
       6  63.989462 51.918173       0
       7  80.209112 33.530778       0
       8  68.954132 35.981256       1
       9  83.378214 40.619652       1
       10 68.778571 6.607066        0

So when value=0, I have to find the closest other city (latitude/longitude) to compute the distance to this city who presents a VALUE=1.

With this stack we can have the formula, but how can I adapt it to take the minimal distance ?

from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    # Radius of earth in kilometers is 6371
    km = 6371* c
    return km

EDIT Here is what I try:

df['dist_VALUE']=0

for i in range(len(df[df['VALUE']<1])):
    for j in range(len(df[df['VALUE']>0])):
        (df[df['VALUE']<1].reset_index(drop=True).loc[i,'dist_VALUE'] =
         min(haversine(df[df['VALUE']<1].reset_index(drop=True).loc[I,'LONG'], 
         df[df['VALUE']<1].reset_index(drop=True).loc[i,'LAT'],
         df[df['VALUE']>0].reset_index(drop=True).loc[j,'LONG'], 
         df[df['VALUE']>0].reset_index(drop=True).loc[j,'LAT'])))

VALUE is integer and LAT or LONG are float.

for each row where value=0 calculate the distance to all cities that have value=1, and then pick the smales distance. If you want further help you will need to show the code of you datastructure that holds the cities (and mention the data type). — Ralf, Mar 09 '19 at 13:38
Hey, thanks for your time, I update the question with what I try. — Alex Germain, Mar 09 '19 at 14:05

Ralf · Accepted Answer · 2019-03-09T19:42:52.890

Maybe this can help you:

import pandas as pd

df = pd.DataFrame(
    data=[
        [74.166061, 30.512811, 1],
        [72.249672, 33.427724, 1],
        [67.499828, 37.937264, 0],
        [84.253715, 69.328767, 1],
        [72.104828, 33.823462, 0],
        [63.989462, 51.918173, 0],
        [80.209112, 33.530778, 0],
        [68.954132, 35.981256, 1],
        [83.378214, 40.619652, 1],
        [68.778571,  6.607066, 0],
    ],
    columns=['lat', 'long', 'val'])
df['min'] = 0
print(df)
# print(df.shape)
# print(df.index)
# print(df.columns)

destination_cities = [
    {
        'i': i,
        'lat': row['lat'],
        'long': row['long'],
    }
    for i, row in df.iterrows()
    if row['val'] == 1]
print('destination_cities')
print(destination_cities)

for i in df.index:
    row = df.iloc[i, :]
    # print(type(row))
    # print(row)

    if row['val'] == 0:
        target_distances = [
            {
                'destination_i': i,
                'distance': haversine(
                    lon1=row['long'],
                    lat1=row['lat'],
                    lon2=destination['long'],
                    lat2=destination['lat']),
            }
            for destination in destination_cities]
        elem = min(target_distances, key=lambda x: x['distance'])
        row = df.loc[i, 'min'] = elem['distance']

print(df)

Another approach could be to pre-compute the shortest distance for each city and the use df.apply() to assign the values; maybe this is a little bit faster for you:

df = pd.DataFrame(
    data=[
        [ 1, 74.166061, 30.512811, 1],
        [ 2, 72.249672, 33.427724, 1],
        [ 3, 67.499828, 37.937264, 0],
        [ 4, 84.253715, 69.328767, 1],
        [ 5, 72.104828, 33.823462, 0],
        [ 6, 63.989462, 51.918173, 0],
        [ 7, 80.209112, 33.530778, 0],
        [ 8, 68.954132, 35.981256, 1],
        [ 9, 83.378214, 40.619652, 1],
        [10, 68.778571,  6.607066, 0],
    ],
    columns=['i', 'lat', 'long', 'val'])

# precompute closest distance for each city with val=0 to all cities with val=1
distances = {}
for _, row_orig in df.iterrows():
    if row_orig['val'] == 0:
        distances[row_orig['i']] = min(
            haversine(
                lon1=row_orig['long'],
                lat1=row_orig['lat'],
                lon2=row_dest['long'],
                lat2=row_dest['lat'])
            for _, row_dest in df.iterrows()
            if row_dest['val'] == 1])

df['min'] = df.apply(lambda row: distances.get(row['i'], 0), axis=1)
print(df)

Thanks for this, it works quite well. I'll try to find a way to reduce the sample where I compute the distance to improve the time consuming because I have 12k rows with 0 and quite the same with 1. — Alex Germain, Mar 09 '19 at 15:30
You think we can improve the time consuming only making difference in 'close' latitude and longitude? Because 2hours for a distance is little bit long ahah — Alex Germain, Mar 09 '19 at 18:53
@AlexGermain I added another version of my solution, maybe this one is faster for all your data. — Ralf, Mar 09 '19 at 19:43

Minimize distance between two latitude-longitude points?

1 Answers1