Extract those points which are at least 3 degrees far from each other

Question

I have 9 points (longitudes, latitudes in degrees) on the surface of Earth follows.

XY = [(100, 10), (100, 11), (100, 13), (101, 10), (101, 11), (101, 13), (103, 10), (103, 11), (103, 13)]
print (len(XY))
# 9

I wanted to extract those points which are at least 3 degrees far from each other.

I tried it as follows.

results = []

for point in XY:
    x1,y1 = point

    for result in results:
        x2,y2 = result

        distance = math.hypot(x2 - x1, y2 - y1)

        if distance >= 3:
            results.append(point)

print (results)

But output is empty.

edit 2

from sklearn.metrics.pairwise import haversine_distances
from math import radians

results = []

for point in XY:
    x1,y1 = [radians(_) for _ in point]

    for result in results:  
        distance = haversine_distances((x1,y1), (x2,y2))    
        print (distance)

        if distance >= 3:
            results.append(point)

print (results)

Still the result is empty

edit 3

results = []

for point in XY:
    x1,y1 = point

    for point in XY:        
        x2,y2 = point

        distance = math.hypot(x2 - x1, y2 - y1)
        print (distance)

        if distance >= 3:
            results.append(point)

print (results)
print (len(results))
# 32   # unexpected len

`math.hypot()` calculates the Euclidean distance between to points, however you need to determine the angular distance between them relative to some line segment (perhaps the horizontal axis). — martineau, Mar 06 '21 at 08:52
@martineau i tried with https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.haversine_distances.html — Roman, Mar 06 '21 at 09:02
If the values are longitude and latitude, then the angular distance between them is that between two imaginary line from the center of the Earth to those points. Regardless of how or what's being calculated, you're right that there's are problems with the way the loop has been written. You need to iterate through _pairs_ of points and compare the calculated value. You're also going to need to compare each one against all the others. — martineau, Mar 06 '21 at 10:30

aneroid · Accepted Answer · 2021-03-06T13:48:33.510

Important: You've said you want to "Extract those points which are at least 3 degrees far from each other" but then you've used the Euclidean distance with math.hypot(). As mentioned by @martineau, this should use the Haversine angular distance.

Since your points are "(longitudes, latitudes in degrees)", they first need to be converted to radians. And the pairs should be flipped so that latitude comes first, as required by the haversine_distances() function. That can be done with:

XY_r = [(math.radians(lat), math.radians(lon)) for lon, lat in XY]

Here's the kicker - none of the combnation-making or looping is necesssary. If haversine_distances() is passed in a list of points, it will calculate the distances between all of them and provide a result as an array of arrays. These can then be converted back to degrees and checked; or convert 3 degrees to radians and then check against h-dists.

import math
import numpy as np
from sklearn.metrics.pairwise import haversine_distances

XY = [(100, 10), (100, 11), (100, 13), (101, 10), (101, 11), (101, 13), (103, 10), (103, 11), (103, 13)]

# convert to radians and flip so that latitude is first
XY_r = [(math.radians(lat), math.radians(lon)) for lon, lat in XY]
distances = haversine_distances(XY_r)  # distances array-of-arrays in RADIANS
dist_criteria = distances >= math.radians(3)  # at least 3 degrees (in radians) away
results = [point for point, result in zip(XY, dist_criteria) if np.any(result)]

print(results)
print(len(results))
print('<3 away from all:', set(XY) - set(results))

Output:

[(100, 10), (100, 11), (100, 13), (101, 10), (101, 13), (103, 10), (103, 11), (103, 13)]
8
<3 away from all: {(101, 11)}

Wrt the previous edit and your original code:

Your first two attempts are giving empty results because of this:

results = []

for point in XY:
    ...
    for result in results:

results is initialised as an empty list. So the for result in results loop will directly exit. Nothing inside the loop executes.

The 3rd attempt is getting you 32 results because of repetitions. You've got:

for point in XY:
    ...
    for point in XY:

so some points you get will be the same point.

To avoid duplicates in the loops:

Add a check for it and go to the next iteration:
```
if (x1, y1) == (x2, y2):
    continue
```
Btw, you're mangling the point variable because it's reused in both loops. It doesn't cause a problem but makes your code harder to debug. Either make them point1 and point2, or even better, instead of for point in XY: x1, y1 = point, you can directly do for x1, y1 in XY - that's called tuple unpacking.
```
for x1, y1 in XY:
    for x2, y2 in XY:
        if (x1, y1) == (x2, y2):
            continue
        ...
```
You also need to change result to be a set instead of a list so that the same point is not re-added to the results when it's more than 3 away from another point. Sets don't allow duplicates, that way points don't get repeated in results.

Use itertools.combinations() to get unique pairs of points without repetitions. This allows you to skip the duplicate check (unless XY actually has duplicate points) and brings the previous block down to one for-loop:
```
import itertools
import math

results = set()  # unique results
for (x1, y1), (x2, y2) in itertools.combinations(XY, r=2):
    distance = math.hypot(x2 - x1, y2 - y1)  # WRONG! see above
    if distance >= 3:
        # add both points
        results.update({(x1, y1), (x2, y2)})

print(results)
print(len(results))
print('<3 away from all:', set(XY) - results)
```
The (wrong) output:
```
{(103, 11), (100, 13), (101, 13), (100, 10), (103, 10), (101, 10), (103, 13), (100, 11)}
8
<3 away from all: {(101, 11)}
```
(The result is the same but merely by coincidence of the input data.)

Since the coordinates of the points are longitude and latitude, the OP clearly needs to be computing Haversine angular distances, not Euclidean — and needs to compare all possible pairings of them. — martineau, Mar 06 '21 at 10:41
@martineau I did mention that at the bottom. But makes sense to provide the _fully correct_ answer. I'll also need to convert the [lat-long from degrees to radians](https://stackoverflow.com/a/10140110/1431750). Will edit. — aneroid, Mar 06 '21 at 10:47
It's not fully correct if the calculations are the wrong ones. BTW, there are `math` module functions to do [angular conversions](https://docs.python.org/3/library/math.html#angular-conversion). — martineau, Mar 06 '21 at 12:17
@martineau Yes, agreed with that, so have updated my answer. Including being able to skip all the looping/combination making. Did use the angular conversion functions instead of the formula by hand. — aneroid, Mar 06 '21 at 13:45
@thunder: Well, kinda sorta. The results are the same because across a relative small angular distance of the surface of a sphere the size of the Earth the area described _is_ almost a flat and therefore approximately Euclidean. — martineau, Mar 06 '21 at 14:40
@thunder As a numerical example showing the difference for points far away: Consider the two points `(0, 0), (30, 70)`. The Haversine Angular Distance in degrees is `72.77` but the Euclidean distance is `76.15`. For more details on which kinds of point combinations give a bigger difference, see this [gis-exchange answer](https://gis.stackexchange.com/a/58668). — aneroid, Mar 06 '21 at 15:18

Extract those points which are at least 3 degrees far from each other

1 Answers1