26

The Problem

Imagine I am stood in an airport. Given a geographic coordinate pair, how can one efficiently determine which airport I am stood in?

Inputs

  • A coordinate pair (x,y) representing the location I am stood at.
  • A set of coordinate pairs [(a1,b1), (a2,b2)...] where each coordinate pair represents one airport.

Desired Output

A coordinate pair (a,b) from the set of airport coordinate pairs representing the closest airport to the point (x,y).

Inefficient Solution

Here is my inefficient attempt at solving this problem. It is clearly linear in the length of the set of airports.

shortest_distance = None
shortest_distance_coordinates = None

point = (50.776435, -0.146834)

for airport in airports:
    distance = compute_distance(point, airport)
    if distance < shortest_distance or shortest_distance is None:
        shortest_distance = distance
        shortest_distance_coordinates = airport

The Question

How can this solution be improved? This might involve some way of pre-filtering the list of airports based on the coordinates of the location we are currently stood at, or sorting them in a certain order beforehand.

Kieran
  • 2,554
  • 3
  • 26
  • 38
  • It can not be improved significantly without any additional knowledge of the problem (i.e. the fact that there is at least one airport within the same langtitude might've helped). To filter airports, you will still need to look at each one of them, so your complexity will stay O(n) (unless, of course, you are doing something terribly complex in `compute_distance()`, which I doubt since you are probably just doing Haversine distance) – Dmitry Torba Aug 23 '16 at 18:04
  • 1
    See https://en.wikipedia.org/wiki/Nearest_neighbor_search for an overview of algorithms and data structures. – NPE Aug 23 '16 at 18:06
  • @DmitryTorba Thanks for your comment. Is this necessarily true? What if we sort the list beforehand in a specific way? – Kieran Aug 23 '16 at 18:19
  • @NPE Thanks for the link, I will have a look to see if there's any stuff that can be applied here. – Kieran Aug 23 '16 at 18:20
  • 1
    Check out the answer to this problem using scipy.spatial.KDTree, a datastructure allowing you to search n-dimensional points in n logn. http://stackoverflow.com/questions/10818546/finding-index-of-nearest-point-in-numpy-arrays-of-x-and-y-coordinates – aberger Aug 23 '16 at 18:32

4 Answers4

46

Using a k-dimensional tree:

>>> from scipy import spatial
>>> airports = [(10,10),(20,20),(30,30),(40,40)]
>>> tree = spatial.KDTree(airports)
>>> tree.query([(21,21)])
(array([ 1.41421356]), array([1]))

Where 1.41421356 is the distance between the queried point and the nearest neighbour and 1 is the index of the neighbour.

See: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query.html#scipy.spatial.KDTree.query

Juddling
  • 4,594
  • 8
  • 34
  • 40
6

If your coordinates are unsorted, your search can only be improved slightly assuming it is (latitude,longitude) by filtering on latitude first as for earth

1 degree of latitude on the sphere is 111.2 km or 69 miles

but that would not give a huge speedup.

If you sort the airports by latitude first then you can use a binary search for finding the first airport that could match (airport_lat >= point_lat-tolerance) and then only compare up to the last one that could match (airport_lat <= point_lat+tolerance) - but take care of 0 degrees equaling 360. While you cannot use that library directly, the sources of bisect are a good start for implementing a binary search.

While technically this way the search is still O(n), you have much fewer actual distance calculations (depending on tolerance) and few latitude comparisons. So you will have a huge speedup.

janbrohl
  • 2,626
  • 1
  • 17
  • 15
  • This is the most promising answer so far. Implementation-wise, I am storing my airports in an SQL database. So I could perform the tolerance queries at the SQL level and then run the distance algorithm on the results. – Kieran Aug 23 '16 at 21:33
  • That would be the best as it ist much faster that way. (works best if you have an index on the latitude) – janbrohl Aug 23 '16 at 22:07
4

From this SO question:

import numpy as np
def closest_node(node, nodes):
    nodes = np.asarray(nodes)
    deltas = nodes - node
    dist_2 = np.einsum('ij,ij->i', deltas, deltas)
    return np.argmin(dist_2)

where node is a tuple with two values (x, y) and nodes is an array of tuples with two values ([(x_1, y_1), (x_2, y_2),])

Community
  • 1
  • 1
  • This code doesn't make much sense here on it's own. It looks like it's trying to optimise the distance calculation. I'm looking for a way to whittle down the list of airports quickly, either by pre-sorting or pre-filtering. Hope this makes sense. – Kieran Aug 23 '16 at 18:21
  • You asked _How can this solution be improved?_ and I answered with a piece of code that goes _better_. Then, if you want some filtering, it's _another kind_ of improvement (or not), which doesn't make mine less. @Kieran –  Aug 23 '16 at 18:23
  • I deliberately omitted the detail of `compute_distance` - we assume we have an efficient method of computing the distance :) – Kieran Aug 23 '16 at 18:25
  • 1
    @Kieran, all right. I will keep my answer here, just in case it is useful for other users. –  Aug 23 '16 at 18:26
1

The answer of @Juddling is great, but KDTree does not support haversine distance, which is better suited for latitude/longitude coordinates. For the haversine distance you can use BallTree. Please note, that you need to convert your coordinates to radians first.

from math import radians
from sklearn.neighbors import BallTree
import numpy as np

airports = [(10,10),(20,20),(30,30),(40,40)]
airports_rad = np.array([[radians(x[0]), radians(x[1])] for x in airports ])
tree = BallTree(airports_rad , metric = 'haversine')
result = tree.query([(radians(21),radians(21))])
print(result)

gives

(array([[0.02391369]]), array([[1]], dtype=int64))

To convert the distance to meters you need to multiply by the earth radius (in meters).

earth_radius = 6371000 # meters in earth
print(result[0][0] * earth_radius)
[152354.11114795]
PeterD
  • 1,331
  • 12
  • 22