I have been trying for a while now to calculate the Euclidean and Minkowski distance between all the vectors in a list of lists. I don't have much advanced mathematical knowledge.
- I am usually working with 4 or 5 dimension vectors
- The vector list can range in size from 0 to around 200,000
- When calculating the distance all the vectors will have the same amount of dimensions
I have relied on these two questions during the process:
python numpy euclidean distance calculation between matrices of row vectors
Calculate Euclidean Distance between all the elements in a list of lists python
At first my code looked like this:
import numpy as np
def euclidean_distance_np(vec_list, single_vec):
dist = (np.array(vec_list) - single_vec) ** 2
dist = np.sum(dist, axis=1)
dist = np.sqrt(dist)
return dist
def minkowski_distance_np(vec_list, single_vec, p_val):
dist = (np.abs(np.array(vec_list, dtype=np.int64) - single_vec) ** p_val).sum(axis=1) ** (1 / p_val)
return dist
This worked well when I had a small amount of vectors. I would calculate the distance of a single vector to all the vectors in the list and repeat the process for every vector in the list one by one, but once the list became 5 or 6 digits in length, these functions became extremely slow.
I managed to improve the Euclidean distance calculation like so:
x = np.array([v[0] for v in vec_list])
y = np.array([v[1] for v in vec_list])
z = np.array([v[2] for v in vec_list])
w = np.array([v[3] for v in vec_list])
t = np.array([v[4] for v in vec_list])
res = np.sqrt(np.square(x - x.reshape(-1,1)) + np.square(y - y.reshape(-1,1)) + np.square(z - z.reshape(-1,1)) + np.square(w - w.reshape(-1,1)) + np.square(t - t.reshape(-1,1)))
But cannot figure out how to implement the calculation method above to correctly calculate Minkowski distance. So, to be precise, my question is how can I calculate Minkowski distance in a similar way to the code I mentioned above.
I would also appreciate any ideas for improvement or better ways to preform the calculations