1

I have lots of points with their coordinates. I want to print at least the three closest neighbors of one point and their distance to that point. How can I do that in Python? In WGS84 system.

NAME    Latitude    Longitude
B   50.94029883 7.019146728
C   50.92073002 6.975268711
D   50.99807758 6.980865543
E   50.98074288 7.035060206
F   51.00696972 7.035993783
G   50.97369889 6.928538763
H   50.94133859 6.927878587
A   50.96712502 6.977825322
Yagiz Degirmenci
  • 16,595
  • 7
  • 65
  • 85
OzcanK
  • 51
  • 2
  • 8
  • Does this answer your question? [Calculate distance between two latitude-longitude points? (Haversine formula)](https://stackoverflow.com/questions/27928/calculate-distance-between-two-latitude-longitude-points-haversine-formula) – nocibambi May 22 '20 at 10:12

1 Answers1

4

Nearest neighbor techniques more efficient for lots of points

  • Brute force (i.e. looping over all the points) complexity is O(N^2)
  • Nearest neighbor algorithms complexity is O(N*log(N))

Nearest Neighbor in Python

  1. BallTree
  2. KdTree
  3. Explaining Nearest Neighbor
  4. BallTree vs. KdTree Performance

Illustration of using BallTree on your problem (related Related Stackoverflow Post)

Code

import pandas as pd
import numpy as np

from sklearn.neighbors import BallTree
from io import StringIO

# Create DataFrame from you lat/lon dataset
data = """NAME Latitude Longitude
B 50.94029883 7.019146728
C 50.92073002 6.975268711
D 50.99807758 6.980865543
E 50.98074288 7.035060206
F 51.00696972 7.035993783
G 50.97369889 6.928538763
H 50.94133859 6.927878587
A 50.96712502 6.977825322"""

# Use StringIO to allow reading of string as CSV
df = pd.read_csv(StringIO(data), sep = ' ')

# Setup Balltree using df as reference dataset
# Use Haversine calculate distance between points on the earth from lat/long
# haversine - https://pypi.org/project/haversine/ 
tree = BallTree(np.deg2rad(df[['Latitude', 'Longitude']].values), metric='haversine')

# Setup distance queries (points for which we want to find nearest neighbors)
other_data = """NAME Latitude Longitude
B_alt 50.94029883 7.019146728
C_alt 50.92073002 6.975268711"""

df_other = pd.read_csv(StringIO(other_data), sep = ' ')

query_lats = df_other['Latitude']
query_lons = df_other['Longitude']

# Find closest city in reference dataset for each in df_other
# use k = 3 for 3 closest neighbors
distances, indices = tree.query(np.deg2rad(np.c_[query_lats, query_lons]), k = 3)

r_km = 6371 # multiplier to convert to km (from unit distance)
for name, d, ind in zip(df_other['NAME'], distances, indices):
  print(f"NAME {name} closest matches:")
  for i, index in enumerate(ind):
    print(f"\t{df['NAME'][index]} with distance {d[i]*r_km:.4f} km")

Output

NAME B_alt closest matches:
    B with distance 0.0000 km
    C with distance 3.7671 km
    A with distance 4.1564 km
NAME C_alt closest matches:
    C with distance 0.0000 km
    B with distance 3.7671 km
    H with distance 4.0350 km
DarrylG
  • 16,732
  • 2
  • 17
  • 23