How to ensure minimum euclidean distance in a list of tuples

Question

I have an extremely large list of coordinates in the form of a list of tuples.

data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]

The list of tuple is actually being formed by a for loop with an append command like so:

data = []
    for i in source: # where i a tuple of form (x,y)
        data.append(i)

Is there an approach to ensure euclidean distance between all tuples is above a certain threshold? In this example there is a very small distance between (1,1),(1,2),(2,1). In this scenario I would like to keep only one of the 3 tuples. Resulting in either one of these new list of tuples:

data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
data = [(2,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]
data = [(1,2),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21)]

I have a brute force algorithm that iterates through the list but there should be a more elegant way or quicker way to do this? Or is there any other methods to speed up this operation? I am expecting lists of ~70k up to 500k tuples.

My method:

from scipy.spatial.distance import euclidean
data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
new_data = []
while len(data) >0:
    
    check = data.pop()
    flag = True
    for i in data:
         if euclidean(check,i) < 5:
              flag = False
              break
         else:
              pass
    if flag == True:
        new_data.append(check)
    else:
        flag = True

Additional points: Although the list of tuples is coming from some iterative function, the order of tuples is uncertain. Actual number of tuples is unknown until end of for loop. I would rather avoid multiprocessing/multithreading for speed up in this scenario. If necessary I can put up some timings but I dont think its necessary. The solution I have right now is time O(n(n-1)/2) and space complexity of O(n) I think so any improvement would be better.

score 0 · Accepted Answer · answered Mar 08 '21 at 13:58

0

You can organize your 2D data/tuples using a Quadtree.

Quadtrees are the two-dimensional analog of octrees and are most often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions.

answered Mar 08 '21 at 13:58

Alex Metsai

1,837
5
12
24

Hi Alex, I am not really understanding how Quadtrees will help. From what I can gather, that would mean as I get points, it builds a point-node with some defined width/height...and for every point I check if it exists in any point-node area and creates a new node if it doesnt exist? A code example of how you mean to apply a Quadtree would be most appreciated. – Jason Chia Mar 08 '21 at 14:34
If you organize your points in a quad-tree, it's fast to find the nearest neighbour. Does this help you? https://stackoverflow.com/questions/32412107/quadtree-find-neighbor – Alex Metsai Mar 08 '21 at 14:38
Ah that explains it quite well. Thanks for the answer. Will accept as valid later. If there are newer solutions would be interesting as well. – Jason Chia Mar 08 '21 at 14:44
Thanks. Your question is interesting and has potential for fruitful answers, I do hope that more people answer with their ideas. – Alex Metsai Mar 08 '21 at 14:55

Belhadjer Samir · Answer 2 · 2021-03-08T15:49:02.373

you can use numpy try this :

import numpy as np

data = [(1,1),(1,11),(1,21),(11,1),(21,1),(11,11),(11,21),(21,11),(21,21),(1,2),(2,1)]
start_time = time.time()
#transform to numpy array
a = np.array(data)
subs = a[:,None] - a
#calculate ecludien distance between all element
dist=np.sqrt(np.einsum('ijk,ijk->ij',subs,subs))
#replace 0 to 5 because distance distance between identic element will be 0
dist=np.where(dist == 0, 5, dist)
#select element where distance sup to 5
dist_bool=[dist[:,0] < 5]
#select element where distance sup to 5 are false 
a=a[dist_bool[0] == False]
print("--- %s seconds ---" % (time.time() - start_time))#got --- 0.00020575523376464844 seconds ---

when we compare to your soltion :

start_time = time.time()
new_data = []
while len(data) >0:
    check = data.pop()
    flag = True
    for i in data:
         if euclidean(check,i) < 5:
              flag = False
              break
         else:
              pass
    if flag == True:
        new_data.append(check)
    else:
        flag = True
print("--- %s seconds ---" % (time.time() - start_time))# got ---0.001013040542602539 seconds ---

Is this answer helped you ? – Belhadjer Samir Mar 09 '21 at 17:38 — Belhadjer Samir, Mar 09 '21 at 17:38
Hi its an interesting solution. Thanks for the input. – Jason Chia Mar 10 '21 at 13:30 — Jason Chia, Mar 10 '21 at 13:30
you can optimize it more if u r inseresting – Belhadjer Samir Mar 10 '21 at 14:32 — Belhadjer Samir, Mar 10 '21 at 14:32

How to ensure minimum euclidean distance in a list of tuples

2 Answers2