i have a matrix A and want to calculate the distance matrix D from it, iteratively. The reason behind wanting to calculate it step by step is to later include some if-statements in the iteration process.
My code right now looks like this:
import numpy as np
from scipy.spatial import distance
def create_data_matrix(n,m):
mean = np.zeros(m)
cov = np.eye(m, dtype=float)
data_matrix = np.random.multivariate_normal(mean,cov,n)
return(data_matrix)
def create_full_distance(A):
distance_matrix = np.triu(distance.squareform(distance.pdist(A,"euclidean")),0)
return(distance_matrix)
matrix_a = create_data_matrix(1000,2)
distance_from_numpy = create_full_distance(matrix_a)
matrix_b = np.empty((1000,1000))
for idx, line in enumerate(matrix_a):
for j, line2 in enumerate(matrix_a):
matrix_b[idx][j] = distance.euclidean(matrix_a[idx],matrix_a[j])
Now the matrices "distance_from_numpy" and "matrix_b" are the same, though matrix_b takes far longer to calculate allthough the matrix_a is only a (100x2) matrix, and i know that "distance.pdist()" method is very fast but i am not sure if i can implement it in an iteration process.
My question is, why is the double for loop so slow and how can i increase the speed while still preserving the iteration process (since i want to include if statements there) ?
edit: for context: i want to preserve the iteration, because i'd like stop the iteration if one of the distances is smaller than a specific number.