How to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus

Question

Given a 10x10 grid (2d-array) filled randomly with numbers, either 0, 1 or 2. How can I find the Euclidean distance (the l2-norm of the distance vector) between two given points considering periodic boundaries?

Let us consider an arbitrary grid point called centre. Now, I want to find the nearest grid point containing the same value as centre. I need to take periodic boundaries into account, such that the matrix/grid can be seen rather as a torus instead of a flat plane. In that case, say the centre = matrix[0,2], and we find that there is the same number in matrix[9,2], which would be at the southern boundary of the matrix. The Euclidean distance computed with my code would be for this example np.sqrt(0**2 + 9**2) = 9.0. However, because of periodic boundaries, the distance should actually be 1, because matrix[9,2] is the northern neighbour of matrix[0,2]. Hence, if periodic boundary values are implemented correctly, distances of magnitude above 8 should not exist.

So, I would be interested on how to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus by applying a wrap-around for the boundaries.

import numpy as np

matrix = np.random.randint(0,3,(10,10))
centre = matrix[0,2]

#rewrite the centre to be the number 5 (to exclude itself as shortest distance)
matrix[0,2] = 5

#find the points where entries are same as centre
same = np.where((matrix == centre) == True)
idx_row, idx_col = same

#find distances from centre to all values which are of same value 
dist = np.zeros(len(same[0]))
for i in range(0,len(same[0])):
    delta_row = same[0][i] - 0 #row coord of centre
    delta_col = same[1][i] - 2 #col coord of centre
    dist[i] = np.sqrt(delta_row**2 + delta_col**2)

#retrieve the index of the smallest distance
idx = dist.argmin() 
print('Centre value: %i. The nearest cell with same value is at (%i,%i)' 
      % (centre, same[0][idx],same[1][idx]))

If the x and y boundaries individually "wrap around" to the other side, it's not topologically a sphere, but a torus. On a sphere, the entire top edge would be considered the same point, namely the north pole, and similarly for the bottom edge. — Thomas, Apr 10 '21 at 15:04
Also, there is no such thing as a Euclidean distance on a sphere. Euclidean distances are in the plane (or higher or lower dimensional Euclidean space) by definition. — Thomas, Apr 10 '21 at 15:05
Excuse me, my mistake, I meant a torus of course. Just edited the question accordingly. Regarding the Euclidean distance, I meant that I am interested in the l2-norm of the difference vector. Thank you for clarifying @Thomas — eldorado, Apr 10 '21 at 15:25
https://stackoverflow.com/questions/52649815/faster-code-to-calculate-distance-between-points-in-numpy-array-with-cyclic-per I believe that one is very similar. Is it correct that the cyclic-ness is matching your condition? — Willem Hendriks, Apr 10 '21 at 20:05
@user3184950 Thank you for the link, I have checked that particular post before, but I wasn't able to completely comprehend the content and thought, it is probably not quite what I need. Though, the cyclic boundary conditions from that post matches what I need. In my case however, I need to find for each specified number the nearest neighbour of same type, exchange each others values, and do the same loop again. I will try the answer below and see how severe the bottleneck will be. — eldorado, Apr 11 '21 at 00:08
Let me know If you like me to rewrite my answer posted there for your case — Willem Hendriks, Apr 11 '21 at 05:02
@user3184950 Sure, I'd be happy. I just went over your answer again and it seems like that with this approach, we iterate through all numbers and calculate the distance, right? There would be a huge efficacy improvement if we could stop the iteration as soon as we have found one in the neighbourhood (I actually don't care about the other ones which are further away, I just need to find the nearest and break) — eldorado, Apr 11 '21 at 11:47

score 1 · Answer 1 · answered Apr 10 '21 at 15:13

1

For each axis, you can check whether the distance is shorter when you wrap around or when you don't. Consider the row axis, with rows i and j.

When not wrapping around, the difference is abs(i - j).
When wrapping around, the difference is "flipped", as in 10 - abs(i - j). In your example with i == 0 and j == 9 you can check that this correctly produces a distance of 1.

Then simply take whichever is smaller:

delta_row = same[0][i] - 0 #row coord of centre
delta_row = min(delta_row, 10 - delta_row)

And similarly for delta_column.

The final dist[i] calculation needs no changes.

answered Apr 10 '21 at 15:13

Thomas

174,939
50
355
478

Thanks for the suggestion. I was initially trying to implement exactly this, but later when applying this procedure, the grid will actually be of size 1000x1000 or even 10000x10000. Because of that I was worried that this could result in a bottleneck, so I was thinking about how to implement this wrap-around without checking which one is shorter. What do you think, could this approach decrease the performance dramatically? – eldorado Apr 10 '21 at 15:33
If it turns out to be a bottleneck, you could probably change the code to use more vectorization. But (depending on your data) I think the real performance improvement would come from not searching all those 100 million elements to find the single one that is closest; rather, you could perform the search in a circle or spiral shape until you find a matching element. – Thomas Apr 10 '21 at 17:08
I would need to loop over each cell that contains that particular value, this is inevitable, but I would not need to compute all the distances actually, I just need to find the nearest one of the same type. So, searching in circular motion could be more efficient indeed, since I can stop as soon as I find one – eldorado Apr 10 '21 at 23:28
Follow up comment: I implemented this approach and ran the code 10 times to have a rough estimate of how long this routine takes for a 100x100 grid. Compared to my implementation where I take a random choice instead, it took me around 5x longer to complete the same calculations. – eldorado Apr 11 '21 at 11:29

Willem Hendriks · Answer 2 · 2021-04-14T12:27:55.027

I have a working 'sketch' of how this could work. In short, I calculate the distance 9 times, 1 for the normal distance, and 8 shifts to possibly correct for a closer 'torus' distance.

As n is getting larger, the calculation costs can go sky high as the numbers go up. But, the torus effect, is probably not needed as there is always a point nearby without 'wrap around'.

You can easily test this, because for a grid of size 1, if a point is found of distance 1/2 or closer, you know there is not a closer torus point (right?)

import numpy as np

n=10000

np.random.seed(1)

A = np.random.randint(low=0, high=10, size=(n,n))

I create 10000x10000 points, and store the location of the 1's in ONES.

ONES = np.argwhere(A == 0)

Now I define my torus distance, which is trying which of the 9 mirrors is the closest.

def distance_on_torus( point=[500,500] ):
    index_diff = [[1],[1],[0],[0],[0,1],[0,1],[0,1],[0,1]]
    coord_diff = [[-1],[1],[-1],[1],[-1,-1],[-1,1],[1,-1],[1,1]]
    
    tree = BallTree( ONES, leaf_size=5*n, metric='euclidean')
    
    dist, indi = tree.query([point],k=1, return_distance=True )

    distances = [dist[0]]

    for indici_to_shift, coord_direction in zip(index_diff, coord_diff):
        MIRROR = ONES.copy()
        for i,shift in zip(indici_to_shift,coord_direction):
            MIRROR[:,i] = MIRROR[:,i] + (shift * n)

        tree = BallTree( MIRROR, leaf_size=5*n, metric='euclidean')
        dist, indi = tree.query([point],k=1, return_distance=True )
        
        distances.append(dist[0])
        
    
    return np.min(distances)

%%time

distance_on_torus([2,3])

It is slow, the above takes 15 minutes.... For n = 1000 less than a second.

A optimisation would be to first consider the none-torus distance, and if the minimum distance is possibly not the smallest, calculate with only the minimum set of extra 'blocks' around. This will greatly increase speed.

How to implement in Python a function to compute the Euclidean distance between two arbitrary points on a torus

2 Answers2

Linked