I have 2 lists of tuples list1 = [(1.332, 3.23344, 3.22), (2.122, 2.11, 2.33), ... (1, 2, 3)]
and list2 = [(4.23, 12.2, 3.333), (1.234, 3.21, 4.342), ... (1.1, 2.2, 3.3)]
. These lists are both very long, somewhere in the millions for both lists. For context, each of these data points is some measure of position in two different datasets. Now I want to correspond each entry in list1
to an entry in list2
if it is "close enough". By close enough I mean the distance between the positions is less than some threshold value (say .1 for example). My initial thought was using the min
function on each entry in list1
. That is, the following:
import numpy as np
import random
def dist(pt1, pt2):
return np.sqrt( ((pt2[0] - pt1[0]) ** 2) + ((pt2[1] - pt1[1]) ** 2) + ((pt2[2] - pt1[2]) ** 2) )
list1 = [(random.random(), random.random(), random.random()) for _ in range(25)]
list2 = [(random.random(), random.random(), random.random()) for _ in range(20)]
threshold = .5
linker = []
for i, entry in enumerate(list1):
m = min(list2, key=lambda x: dist(entry, x))
if dist(entry, m) < threshold:
linker.append((i, list2.index(m))
So this would link each index in list1
to and index in list2
. But I feel like there must be some already developed algorithm for this task specifically which is much faster, is there?