Multidimensional Euclidean Distance in Python

Question

I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy.

Here is my code:

import numpy,scipy;

A=numpy.array([116.629, 7192.6, 4535.66, 279714, 176404, 443608, 295522, 1.18399e+07, 7.74233e+06, 2.85839e+08, 2.30168e+08, 5.6919e+08, 168989, 7.48866e+06, 1.45261e+06, 7.49496e+07, 2.13295e+07, 3.74361e+08, 54.5, 3349.39, 262.614, 16175.8, 3693.79, 205865]);

B=numpy.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151246, 6795630, 4566625, 2.0355328e+08, 1.4250515e+08, 3.2699482e+08, 95635, 4470961, 589043, 29729866, 6124073, 222.3]);

However, I used scipy.spatial.distance.cdist(A[numpy.newaxis,:],B,'euclidean') to calcuate the eucleidan distance.

But it gave me an error

raise ValueError('XB must be a 2-dimensional array.');

I don't seem to understand it.

I looked up scipy.spatial.distance.pdist but don't understand how to use it?

Is there any other better way to do it?

Perhaps [`scipy.spatial.distance.euclidean`](http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html#scipy.spatial.distance.euclidean)? — Michael Mior, Feb 23 '12 at 14:16
So, you have 2, 24-dimensional points? In that case, @Mr.E's answer is the best option. However, when you have more than 2 points, the various `scipy.spatial.distance` functions will be more efficient. — Joe Kington, Feb 23 '12 at 14:26
I thought perhaps I was missing something. Posted as an answer if that solves your problem. — Michael Mior, Feb 23 '12 at 17:24
I would like to say something about the error you received long time ago and it might help others in need. Reading from the docs both arrays A and B need to have the same dimensions. This means that if your first array A has a 2-dimensional shape (like you defined with `A[numpy.newaxis,:]`) also your second array needs to have the same dimensions. Writing `B[numpy.newaxis,:]` should therefore solve the error. — Julian Gorfer, Sep 19 '20 at 22:36

Michael Mior · Accepted Answer · 2019-02-04T02:32:06.927

26

Perhaps scipy.spatial.distance.euclidean?

Examples

>>> from scipy.spatial import distance
>>> distance.euclidean([1, 0, 0], [0, 1, 0])
1.4142135623730951
>>> distance.euclidean([1, 1, 0], [0, 1, 0])
1.0

edited Feb 04 '19 at 02:32

answered Feb 23 '12 at 17:24

Michael Mior

28,107
9
89
113

score 14 · Answer 2 · answered Feb 23 '12 at 14:15

14

Use either

numpy.sqrt(numpy.sum((A - B)**2))

or more simply

numpy.linalg.norm(A - B)

answered Feb 23 '12 at 14:15

YXD

31,741
15
75
115

Xavier Guihot · Answer 3 · 2019-07-28T05:30:02.317

10

Starting Python 3.8, you can use standard library's math module and its new dist function, which returns the euclidean distance between two points (given as lists or tuples of coordinates):

from math import dist

dist([1, 0, 0], [0, 1, 0]) # 1.4142135623730951

edited Jul 28 '19 at 05:30

answered Jan 15 '19 at 10:46

Xavier Guihot

54,987
21
291
190

1

And it's noticeably faster than scipy's euclidean function! +1 – mauriii Aug 26 '20 at 07:09

score 7 · Answer 4 · answered Feb 23 '12 at 14:25

7

A and B are 2 points in the 24-D space. You should use scipy.spatial.distance.euclidean.

Doc here

scipy.spatial.distance.euclidean(A, B)

answered Feb 23 '12 at 14:25

Ade YU

2,292
3
18
28

score 5 · Answer 5 · answered Dec 12 '17 at 21:29

Since all of the above answers refer to numpy and or scipy, just wanted to point out that something really simple can be done with reduce here

def n_dimensional_euclidean_distance(a, b):
   """
   Returns the euclidean distance for n>=2 dimensions
   :param a: tuple with integers
   :param b: tuple with integers
   :return: the euclidean distance as an integer
   """
   dimension = len(a) # notice, this will definitely throw a IndexError if len(a) != len(b)

   return sqrt(reduce(lambda i,j: i + ((a[j] - b[j]) ** 2), range(dimension), 0))

This will sum all pairs of (a[j] - b[j])^2 for all j in the number of dimensions (note that for simplicity this doesn't support n<2 dimensional distance).

score 4 · Answer 6 · answered Feb 23 '12 at 14:29

Apart from the already mentioned ways of computing the Euclidean distance, here's one that's close to your original code:

scipy.spatial.distance.cdist([A], [B], 'euclidean')

or

scipy.spatial.distance.cdist(np.atleast_2d(A), np.atleast_2d(B), 'euclidean')

This returns a 1×1 np.ndarray holding the L2 distance.

score 1 · Answer 7 · answered Sep 18 '21 at 10:33

Writing your own custom sqaure root sum square is not always safe

You can use math.hypot, numpy.hypot or scipy distance function rather than writing numpy.sqrt(numpy.sum((A - B)**2)) or (i**2 + j**2)**0.5. In your case maybe they can overflow

refer

Speed wise

%%timeit
math.hypot(*(A - B))
# 3 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%%timeit
numpy.sqrt(numpy.sum((A - B)**2))
# 5.65 µs ± 50.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Safety wise

Underflow

i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0

Overflow

i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf

No Underflow

i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200

No Overflow

i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200