The following codes iterates over each element of two array to compute pairwise euclidean distance.
def compute_distances_two_loops(X, Y):
num_test = X.shape[0]
num_train = Y.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
for j in range(num_train):
dists[i][j] = np.sqrt(np.sum((X[i] - Y[j])**2))
return dists
The following code serves the same purpose but with single loop.
def compute_distances_one_loop(X, Y):
num_test = X.shape[0]
num_train = Y.shape[0]
dists = np.zeros((num_test, num_train))
for i in range(num_test):
dists[i, :] = np.sqrt(np.sum((Y - X[i])**2, axis=1))
return dists
Below are time comparison for both.
two_loop_time = time_function(compute_distances_two_loops, X, Y)
print ('Two loop version took %f seconds' % two_loop_time)
>> Two loop version took 20.3 seconds
one_loop_time = time_function(compute_distances_one_loop, X, Y)
print ('One loop version took %f seconds' % one_loop_time)
>> One loop version took 80.9 seconds
Both X and Y are numpy.ndarray with shape -
X: (500, 3000) Y: (5000, 3000)
Out of intuition the results are not correct, the single loop should run at least with same speed. What am I missing here ?
PS: The result is not from a single run. I ran the code number of times, on different machines, the results are similar.