I am interested in efficiently sorting through two very long sorted lists to find the closest pair.I could of course write two for-loops, but this code is very slow for long elements in a list. Doing a bit of reading, it seems that "numpy arrays" are particularly fast, and are similar to Matlab (which I have experience in "vectorizing" code with for loops in).
One of the answers suggests a fast way of finding the nearest value in a numpy array. So all I need to do to do this for two lists, is loop through the values of one list:
import numpy as np
import numpy.matlib
import glob
import time
import math
def find_nearest(array,value):
idx = np.searchsorted(array, value, side="left")
if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
return array[idx-1]
else:
return array[idx]
x1 = np.array([1, 3, 4, 5, 19]);
x2 = np.array([6, 18, 24, 36, 37]);
dtarray = numpy.array([]);
for i in range(x1.size):
dtarray = np.append(dtarray, math.fabs(x2[i]-find_nearest(x1, x2[i])))
print(dtarray)
I've eliminated one for-loop, and now I'm interested in speeding this up further. It seems like list-comprehensions will be useful for this task (and maybe I can figure out how to parallelize them) - but I'm having trouble getting them to work:
dtarray2 = [math.fabs(x2[i]-find_nearest(x1, x2[i])) for i in range(x1.size)]
Is this syntax not correct? I get the message:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<timed exec> in <module>
<timed exec> in <listcomp>(.0)
NameError: name 'x2' is not defined