0

I am trying to calculate the euclidean distance between two matrices using only matrix operations in numpy python, but without using any for loops.

If I needed to calculate this for only two single vectors it would be trivial since I would just use the formula for euclidean distance:

D(x, y) = ∥y – x∥ = √ ( xT x + yT y – 2 xT y )

and converting that into numpy would be as simple as doing something like this

a@a.T - 2*a@b.T + b@b.T

however, doing it for two matrices of size PxQ and RxQ is proving to be tricky without the use of for loops.

To give a better example of what I am trying to do if I had this matrix called A:

[[ 4  9  9]
 [11  8  1]
 [ 2  6  4]
 [ 4  7 11]
 [ 6  7  9]]

and a matrix called B:

[[ 5  8  8]
 [10  5  1]
 [ 6  6  9]
 [ 2  1  2]
 [ 9  1  3]
 [ 6  1  6]
 [ 9 10  3]
 [10  4  8]]

Then I would want to be able to calculate the resultant matrix C:

[[ 1.73205081 10.77032961  3.60555128 10.81665383 11.18033989  8.77496439
   7.87400787  7.87400787]
 [ 9.21954446  3.16227766  9.64365076 11.44552314  7.54983444  9.94987437
   3.46410162  8.1240384 ]
 [ 5.38516481  8.60232527  6.40312424  5.38516481  8.66025404  6.70820393
   8.1240384   9.16515139]
 [ 3.31662479 11.83215957  3.         11.         11.18033989  8.06225775
   9.89949494  7.34846923]
 [ 1.73205081  9.16515139  1.         10.04987562  9.          6.70820393
   7.34846923  5.09901951]]

I believe the top solution on stack overflow for this thread solution is doing exactly that, but it is in matlab and I am having trouble converting it into a Python numpy solution.

knowledge_seeker
  • 811
  • 1
  • 8
  • 18

2 Answers2

1

Try numpy broadcasting

dist_mat = np.sum((a[:,None] - b)**2, axis=-1)**.5

Output:

array([[ 1.73205081, 10.77032961,  3.60555128, 10.81665383, 11.18033989,
         8.77496439,  7.87400787,  7.87400787],
       [ 9.21954446,  3.16227766,  9.64365076, 11.44552314,  7.54983444,
         9.94987437,  3.46410162,  8.1240384 ],
       [ 5.38516481,  8.60232527,  6.40312424,  5.38516481,  8.66025404,
         6.70820393,  8.1240384 ,  9.16515139],
       [ 3.31662479, 11.83215957,  3.        , 11.        , 11.18033989,
         8.06225775,  9.89949494,  7.34846923],
       [ 1.73205081,  9.16515139,  1.        , 10.04987562,  9.        ,
         6.70820393,  7.34846923,  5.09901951]])
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

Use np.linalg.norm across last axis:

>>> np.linalg.norm((a[:,None] - b), axis=-1)
array([[ 1.73205081, 10.77032961,  3.60555128, 10.81665383, 11.18033989,
         8.77496439,  7.87400787,  7.87400787],
       [ 9.21954446,  3.16227766,  9.64365076, 11.44552314,  7.54983444,
         9.94987437,  3.46410162,  8.1240384 ],
       [ 5.38516481,  8.60232527,  6.40312424,  5.38516481,  8.66025404,
         6.70820393,  8.1240384 ,  9.16515139],
       [ 3.31662479, 11.83215957,  3.        , 11.        , 11.18033989,
         8.06225775,  9.89949494,  7.34846923],
       [ 1.73205081,  9.16515139,  1.        , 10.04987562,  9.        ,
         6.70820393,  7.34846923,  5.09901951]])
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52