Fastest way to calculate Euclidean and Minkowski distance between all the vectors in a list of lists python

Question

I have been trying for a while now to calculate the Euclidean and Minkowski distance between all the vectors in a list of lists. I don't have much advanced mathematical knowledge.

I am usually working with 4 or 5 dimension vectors
The vector list can range in size from 0 to around 200,000
When calculating the distance all the vectors will have the same amount of dimensions

I have relied on these two questions during the process:

python numpy euclidean distance calculation between matrices of row vectors

Calculate Euclidean Distance between all the elements in a list of lists python

At first my code looked like this:

import numpy as np

def euclidean_distance_np(vec_list, single_vec):
    dist = (np.array(vec_list) - single_vec) ** 2
    dist = np.sum(dist, axis=1)
    dist = np.sqrt(dist)
    return dist

def minkowski_distance_np(vec_list, single_vec, p_val):
    dist = (np.abs(np.array(vec_list, dtype=np.int64) - single_vec) ** p_val).sum(axis=1) ** (1 / p_val)
    return dist

This worked well when I had a small amount of vectors. I would calculate the distance of a single vector to all the vectors in the list and repeat the process for every vector in the list one by one, but once the list became 5 or 6 digits in length, these functions became extremely slow.

I managed to improve the Euclidean distance calculation like so:

x = np.array([v[0] for v in vec_list])
y = np.array([v[1] for v in vec_list])
z = np.array([v[2] for v in vec_list])
w = np.array([v[3] for v in vec_list])
t = np.array([v[4] for v in vec_list])

res = np.sqrt(np.square(x - x.reshape(-1,1)) + np.square(y - y.reshape(-1,1)) + np.square(z - z.reshape(-1,1)) + np.square(w - w.reshape(-1,1)) + np.square(t - t.reshape(-1,1)))

But cannot figure out how to implement the calculation method above to correctly calculate Minkowski distance. So, to be precise, my question is how can I calculate Minkowski distance in a similar way to the code I mentioned above.

I would also appreciate any ideas for improvement or better ways to preform the calculations

For simple arrays (2x 2d) (not lists) this is an example https://stackoverflow.com/a/58752553/4045774 using numba. — max9111, Nov 25 '20 at 12:47

Marcin Mrugas · Answer 1 · 2020-11-25T14:45:37.880

1

Scipy has already implemented distance functions: minkowski, euclidean. But probably what you need is cdist.

Numpy is great tool for matrices manipulation, but it doesn't contain all possible functions. You can find most of additional features and operations in SciPy which is more related to mathematics, science, and engineering.

edited Nov 25 '20 at 14:45

answered Nov 25 '20 at 10:48

Marcin Mrugas

973
8
17

great reference, unfortunately got Memory Error on large matrices – Gidon Nov 26 '20 at 07:22

Fastest way to calculate Euclidean and Minkowski distance between all the vectors in a list of lists python

1 Answers1