0

I have an array like this and I have to find the distance between each points. How could I do so in python with numpy?

array([[  8139, 112607],
       [  8139, 115665],
       [  8132, 126563],
       [  8193, 113938],
       [  8193, 123714],
       [  8156, 120291],
       [  8373, 125253],
       [  8400, 131442],
       [  8400, 136354],
       [  8401, 129352],
       [  8439, 129909],
       [  8430, 135706],
       [  8430, 146359],
       [  8429, 139089],
       [  8429, 133243]])
Ali_Sh
  • 2,667
  • 3
  • 43
  • 66
S -
  • 11
  • 2
  • Does this answer your question? [Efficiently Calculating a Euclidean Distance Matrix Using Numpy](https://stackoverflow.com/questions/22720864/efficiently-calculating-a-euclidean-distance-matrix-using-numpy) – Ali_Sh Jan 18 '22 at 19:02

2 Answers2

1

Let's minimize this problem down to 4 points:

points = np.array([[8139, 115665], [8132, 126563], [8193, 113938], [8193, 123714]])

In general, you need to do 2 steps:

  • Make an indices of pairs of points you want to take
  • Apply np.hypot for these pairs.

TL;DR

Making an indices of points

There are many ways of how you would like to create pairs of indices for each pair of points you'd like to take. But where do they come from? In every case it's a good idea to start building them from adjancency matrix.

Case 1

In the most common way you can start from building it like so:

adjacency = np.ones(shape=(len(points), len(points)), dtype=bool)
>>> adjacency
[[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]]

It corresponds to indices you need to take like so:

adjacency_idx_view = np.transpose(np.nonzero(adjacency))
for n in adjacency_idx_view.reshape(len(points), len(points), 2):
>>> print(n.tolist())
[[0, 0], [1, 0], [2, 0], [3, 0]]
[[0, 1], [1, 1], [2, 1], [3, 1]]
[[0, 2], [1, 2], [2, 2], [3, 2]]
[[0, 3], [1, 3], [2, 3], [3, 3]]

And this is how you collect them:

x, y = np.nonzero(adjacency)
>>> np.transpose([x, y])
array([[0, 0],
       [0, 1],
       [0, 2],
       [0, 3],
       [1, 0],
       [1, 1],
       [1, 2],
       [1, 3],
       [2, 0],
       [2, 1],
       [2, 2],
       [2, 3],
       [3, 0],
       [3, 1],
       [3, 2],
       [3, 3]], dtype=int64)

It could be done also manually like in @ Corralien's answer:

x = np.repeat(np.arange(len(points)), len(points))
y = np.tile(np.arange(len(points)), len(points))

Case 2

In previous case every pair of point is duplicated. There are also pairs with points duplicating. A better option is to omit this excessive data and take only pairs with index of first point being less than index of the second one:

adjacency = np.less.outer(np.arange(len(points)), np.arange(len(points)))
>>> print(adjacency)
[[False  True  True  True]
 [False False  True  True]
 [False False False  True]
 [False False False False]]
x, y = np.nonzero(adjacency)

This is not used widely. Although this lays beyond the hood of np.triu_indices. Hence, as an alternative, we could use:

x, y = np.triu_indices(len(points), 1)

And this results in:

>>> np.transpose([x, y])
array([[0, 1],
       [0, 2],
       [0, 3],
       [0, 4],
       [1, 2],
       [1, 3],
       [1, 4],
       [2, 3],
       [2, 4],
       [3, 4]])

Case 3 You could also try omit only pairs of duplicated points and leave pairs with points being swapped. As in Case 1 it costs 2x memory and consumption time so I'll leave it for demonstration purposes only:

adjacency = ~np.identity(len(points), dtype=bool)
>>> adjacency
array([[False,  True,  True,  True],
       [ True, False,  True,  True],
       [ True,  True, False,  True],
       [ True,  True,  True, False]])
x, y = np.nonzero(adjacency)
>>> np.transpose([x, y])
array([[0, 1],
       [0, 2],
       [0, 3],
       [1, 0],
       [1, 2],
       [1, 3],
       [2, 0],
       [2, 1],
       [2, 3],
       [3, 0],
       [3, 1],
       [3, 2]], dtype=int64)

I'll leave making x and y manually (without masking) as an exercise for the others.

Apply np.hypot

Instead of np.sqrt(np.sum((a - b) ** 2, axis=1)) you could do np.hypot(np.transpose(a - b)). I'll take my Case 2 as my index generator:

def distance(points):
    x, y = np.triu_indices(len(points), 1)
    x_coord, y_coord = np.transpose(points[x] - points[y])
    return np.hypot(x_coord, y_coord)

>>> distance(points)
array([10898.00224812,  1727.84403231,  8049.18113848, 12625.14736548,
        2849.65296133,  9776.        ])
mathfux
  • 5,759
  • 1
  • 14
  • 34
0

You can use np.repeat and np.tile to create all combinations then compute the euclidean distance:

xy = np.array([[8139, 115665], [8132, 126563], [8193, 113938], [8193, 123714],
               [8156, 120291], [8373, 125253], [8400, 131442], [8400, 136354],
               [8401, 129352], [8439, 129909], [8430, 135706], [8430, 146359],
               [8429, 139089], [8429, 133243]])

a = np.repeat(xy, len(xy), axis=0)
b = np.tile(xy, [len(xy), 1])
d = np.sqrt(np.sum((a - b) ** 2, axis=1))

The output of d is (196,) which is 14 x 14.

Update

but I have to do it in a function.

def distance(xy):
    a = np.repeat(xy, len(xy), axis=0)
    b = np.tile(xy, [len(xy), 1])
    return np.sqrt(np.sum((a - b) ** 2, axis=1))

d = distance(xy)
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • this code works but my array is in a (43,2) shape. how can I do it? when I apply this code to my original array it didn't work. ```array([[ 8139, 112607], [ 8139, 115665], [ 8132, 126563], [ 8193, 113938], [ 8193, 123714], [ 8156, 120291], [ 8373, 125253], [ 8400, 131442], [ 8400, 136354], [ 8401, 129352], [ 8439, 129909], [ 8430, 135706], [ 10378, 246013]], dtype=int64)``` – S - Jan 17 '22 at 19:40
  • What is the error message? – Corralien Jan 17 '22 at 19:53
  • oh i fixed it but i have another error. it says NameError: name 'int64' is not defined. Because as you can see there is dtype=int64 thing at the end of my array. how can i get rid of this? Can you help me please :(( – S - Jan 17 '22 at 19:58
  • how can i remove it I'm new in python I'm so sorry – S - Jan 17 '22 at 20:03
  • `xy = np.array([[ 8139, 112607], [ 8139, 115665], [ 8132, 126563], [ 8193, 113938], [ 8193, 123714], [ 8156, 120291], [ 8373, 125253], [ 8400, 131442], [ 8400, 136354], [ 8401, 129352], [ 8439, 129909], [ 8430, 135706], [ 10378, 246013]])` – Corralien Jan 17 '22 at 20:09
  • but i have to do it in a function – S - Jan 17 '22 at 20:14
  • I updated my answer. Can you check it please? – Corralien Jan 17 '22 at 20:56