1

I'm looking for a commandline solution to find the nearest sets of points from a list of CSV coordinates.

Here this was answered for Excel, but I need a somewhat different solution.

I'm NOT looking for the nearest point for every point, but for the point pairs with least distance from each other.

I would like to match many power plants from GEO, so a (python?) commandline tool would be great.

Here is an example dataset:

Chicoasén Dam,16.941064,-93.100828
Tuxpan Oil Power Plant,21.014891,-97.334492
Petacalco Coal Power Plant,17.983575,-102.115252
Angostura Dam,16.401226,-92.778926
Tula Oil Power Plant,20.055825,-99.276857
Carbon II Coal Power Plant,28.467176,-100.698559
Laguna Verde Nuclear Power Plant,19.719095,-96.406347
Carbón I Coal Power Plant,28.485238,-100.69096
Manzanillo I Oil Power Plant,19.027372,-104.319274
Tamazunchale Gas Power Plant,21.311282,-98.756266

The tool should print "Carbon II" and "Carbon I", because this pair has the minimal distance.

A code fragment could be:

from math import radians, cos, sin, asin, sqrt
import csv

def haversine(lon1, lat1, lon2, lat2):
    # convert decimal degrees to radians
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 

    km = 6371 * c
    return km 

with open('mexico-test.csv', newline='') as csvfile:
    so = csv.reader(csvfile, delimiter=',', quotechar='|')
    data = []
    for row in so:
        data.append(row)

print(haversine(28.467176,-100.698559,28.485238,-100.69096))
pickenpack
  • 87
  • 10

1 Answers1

0

A simple method is to compute all pairs, then find the minimum pair, where the "size" of a pair is defined as the distance between the two points in the pair:

from itertools import combinations

closest = min(combinations(data, 2),
              key=lambda p: haversine(float(p[0][1]), float(p[0][2]), float(p[1][1]), float(p[1][2])))

To get the five smallest, use a heap with the same key.

import heap

pairs = list(combinations(data, 2))
heap.heapify(pairs)
five_smallest = heapq.nsmallest(
    5,
    combinations(data, 2),
    key=lambda p: haversine(float(p[0][1]), float(p[0][2]), float(p[1][1]), float(p[1][2])))
chepner
  • 497,756
  • 71
  • 530
  • 681