21

I want to calculate the Euclidean distance in multiple dimensions (24 dimensions) between 2 arrays. I'm using numpy-Scipy.

Here is my code:

import numpy,scipy;

A=numpy.array([116.629, 7192.6, 4535.66, 279714, 176404, 443608, 295522, 1.18399e+07, 7.74233e+06, 2.85839e+08, 2.30168e+08, 5.6919e+08, 168989, 7.48866e+06, 1.45261e+06, 7.49496e+07, 2.13295e+07, 3.74361e+08, 54.5, 3349.39, 262.614, 16175.8, 3693.79, 205865]);

B=numpy.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 151246, 6795630, 4566625, 2.0355328e+08, 1.4250515e+08, 3.2699482e+08, 95635, 4470961, 589043, 29729866, 6124073, 222.3]);

However, I used scipy.spatial.distance.cdist(A[numpy.newaxis,:],B,'euclidean') to calcuate the eucleidan distance.

But it gave me an error

raise ValueError('XB must be a 2-dimensional array.');

I don't seem to understand it.

I looked up scipy.spatial.distance.pdist but don't understand how to use it?

Is there any other better way to do it?

Michael Mior
  • 28,107
  • 9
  • 89
  • 113
garak
  • 4,713
  • 9
  • 39
  • 56
  • 3
    Perhaps [`scipy.spatial.distance.euclidean`](http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html#scipy.spatial.distance.euclidean)? – Michael Mior Feb 23 '12 at 14:16
  • 1
    So, you have 2, 24-dimensional points? In that case, @Mr.E's answer is the best option. However, when you have more than 2 points, the various `scipy.spatial.distance` functions will be more efficient. – Joe Kington Feb 23 '12 at 14:26
  • I thought perhaps I was missing something. Posted as an answer if that solves your problem. – Michael Mior Feb 23 '12 at 17:24
  • 1
    I would like to say something about the error you received long time ago and it might help others in need. Reading from the docs both arrays A and B need to have the same dimensions. This means that if your first array A has a 2-dimensional shape (like you defined with `A[numpy.newaxis,:]`) also your second array needs to have the same dimensions. Writing `B[numpy.newaxis,:]` should therefore solve the error. – Julian Gorfer Sep 19 '20 at 22:36
  • 1
    @JoeKington Who is Mr.E!? :) – jtlz2 Sep 02 '21 at 17:01

7 Answers7

26

Perhaps scipy.spatial.distance.euclidean?

Examples

>>> from scipy.spatial import distance
>>> distance.euclidean([1, 0, 0], [0, 1, 0])
1.4142135623730951
>>> distance.euclidean([1, 1, 0], [0, 1, 0])
1.0
Michael Mior
  • 28,107
  • 9
  • 89
  • 113
14

Use either

numpy.sqrt(numpy.sum((A - B)**2))

or more simply

numpy.linalg.norm(A - B)
YXD
  • 31,741
  • 15
  • 75
  • 115
10

Starting Python 3.8, you can use standard library's math module and its new dist function, which returns the euclidean distance between two points (given as lists or tuples of coordinates):

from math import dist

dist([1, 0, 0], [0, 1, 0]) # 1.4142135623730951
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
7

A and B are 2 points in the 24-D space. You should use scipy.spatial.distance.euclidean.

Doc here

scipy.spatial.distance.euclidean(A, B)
Ade YU
  • 2,292
  • 3
  • 18
  • 28
5

Since all of the above answers refer to numpy and or scipy, just wanted to point out that something really simple can be done with reduce here

def n_dimensional_euclidean_distance(a, b):
   """
   Returns the euclidean distance for n>=2 dimensions
   :param a: tuple with integers
   :param b: tuple with integers
   :return: the euclidean distance as an integer
   """
   dimension = len(a) # notice, this will definitely throw a IndexError if len(a) != len(b)

   return sqrt(reduce(lambda i,j: i + ((a[j] - b[j]) ** 2), range(dimension), 0))

This will sum all pairs of (a[j] - b[j])^2 for all j in the number of dimensions (note that for simplicity this doesn't support n<2 dimensional distance).

Fohlen
  • 192
  • 2
  • 18
4

Apart from the already mentioned ways of computing the Euclidean distance, here's one that's close to your original code:

scipy.spatial.distance.cdist([A], [B], 'euclidean')

or

scipy.spatial.distance.cdist(np.atleast_2d(A), np.atleast_2d(B), 'euclidean')

This returns a 1×1 np.ndarray holding the L2 distance.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
1

Writing your own custom sqaure root sum square is not always safe

You can use math.hypot, numpy.hypot or scipy distance function rather than writing numpy.sqrt(numpy.sum((A - B)**2)) or (i**2 + j**2)**0.5. In your case maybe they can overflow

refer

Speed wise

%%timeit
math.hypot(*(A - B))
# 3 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%%timeit
numpy.sqrt(numpy.sum((A - B)**2))
# 5.65 µs ± 50.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Safety wise

Underflow

i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0

Overflow

i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf

No Underflow

i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200

No Overflow

i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200
eroot163pi
  • 1,791
  • 1
  • 11
  • 23