Important: You've said you want to "Extract those points which are at least 3 degrees far from each other" but then you've used the Euclidean distance with math.hypot()
. As mentioned by @martineau, this should use the Haversine angular distance.
Since your points are "(longitudes, latitudes in degrees)", they first need to be converted to radians. And the pairs should be flipped so that latitude comes first, as required by the haversine_distances()
function. That can be done with:
XY_r = [(math.radians(lat), math.radians(lon)) for lon, lat in XY]
Here's the kicker - none of the combnation-making or looping is necesssary. If haversine_distances()
is passed in a list of points, it will calculate the distances between all of them and provide a result as an array of arrays. These can then be converted back to degrees and checked; or convert 3 degrees
to radians and then check against h-dists.
import math
import numpy as np
from sklearn.metrics.pairwise import haversine_distances
XY = [(100, 10), (100, 11), (100, 13), (101, 10), (101, 11), (101, 13), (103, 10), (103, 11), (103, 13)]
# convert to radians and flip so that latitude is first
XY_r = [(math.radians(lat), math.radians(lon)) for lon, lat in XY]
distances = haversine_distances(XY_r) # distances array-of-arrays in RADIANS
dist_criteria = distances >= math.radians(3) # at least 3 degrees (in radians) away
results = [point for point, result in zip(XY, dist_criteria) if np.any(result)]
print(results)
print(len(results))
print('<3 away from all:', set(XY) - set(results))
Output:
[(100, 10), (100, 11), (100, 13), (101, 10), (101, 13), (103, 10), (103, 11), (103, 13)]
8
<3 away from all: {(101, 11)}
Wrt the previous edit and your original code:
Your first two attempts are giving empty results because of this:
results = []
for point in XY:
...
for result in results:
results
is initialised as an empty list. So the for result in results
loop will directly exit. Nothing inside the loop executes.
The 3rd attempt is getting you 32 results because of repetitions. You've got:
for point in XY:
...
for point in XY:
so some point
s you get will be the same point.
To avoid duplicates in the loops:
Add a check for it and go to the next iteration:
if (x1, y1) == (x2, y2):
continue
Btw, you're mangling the point
variable because it's reused in both loops. It doesn't cause a problem but makes your code harder to debug. Either make them point1
and point2
, or even better, instead of for point in XY: x1, y1 = point
, you can directly do for x1, y1 in XY
- that's called tuple unpacking.
for x1, y1 in XY:
for x2, y2 in XY:
if (x1, y1) == (x2, y2):
continue
...
You also need to change result
to be a set
instead of a list so that the same point is not re-added to the results when it's more than 3 away from another point. Sets don't allow duplicates, that way points don't get repeated in results.
Use itertools.combinations()
to get unique pairs of points without repetitions. This allows you to skip the duplicate check (unless XY
actually has duplicate points) and brings the previous block down to one for-loop:
import itertools
import math
results = set() # unique results
for (x1, y1), (x2, y2) in itertools.combinations(XY, r=2):
distance = math.hypot(x2 - x1, y2 - y1) # WRONG! see above
if distance >= 3:
# add both points
results.update({(x1, y1), (x2, y2)})
print(results)
print(len(results))
print('<3 away from all:', set(XY) - results)
The (wrong) output:
{(103, 11), (100, 13), (101, 13), (100, 10), (103, 10), (101, 10), (103, 13), (100, 11)}
8
<3 away from all: {(101, 11)}
(The result is the same but merely by coincidence of the input data.)