0

I am facing a problem for computing a large distance matrix. However, this is a specific distance matrix: it is a matrix of points which are in a unit cell. This function gets fractional coordinates (between 0 and 1 in all dimensions) and I would like to calculate the distance matrix accounting for the fact that there is an identical copy of the point in each neighbor of the unit cell, and therefore the correct distance may be with the copy rather than with the other point within the unit cell.

Do you know if anything can be done with scipy or numpy pre-coded C libraries for this? I've done a numba code which works but runs rather slowly. Here I have a list of 13160 points for which I want to calculate a 13160*13160 distance matrix, ie that contains 173185600 elements.

The principle is: for each coordinate, calculate the square fractional distance of the first point with the second point either within the cell, or in one of its two neigbhors (behind and in front). Then get the minimum of the square distance for each coordinate and get the corresponding Euclidian distance from the cartesian coordinates.

The time it currently takes is: 40.82661843299866 seconds

Do you know if I can make it run faster by any means, or is my dataset just large and there is nothing more to be done?

Below is the code:

def getDistInCell(fract, xyz, n_sg, a, b, c):                           #calculates the distance matrix accounting for symmetry   
    dist = np.zeros((n_sg, n_sg))
    for i in range(n_sg):
        for j in range(n_sg):
            #we evaluate the closest segment according to translation to neighbouring cells
            diff_x = np.zeros((3))
            diff_y = np.zeros((3))
            diff_z = np.zeros((3))
            
            diff_x[0] = (fract[i][0] - (fract[j][0] - 1))**2
            diff_x[1] = (fract[i][0] - (fract[j][0]    ))**2
            diff_x[2] = (fract[i][0] - (fract[j][0] + 1))**2
            
            diff_y[0] = (fract[i][1] - (fract[j][1] - 1))**2
            diff_y[1] = (fract[i][1] - (fract[j][1]    ))**2
            diff_y[2] = (fract[i][1] - (fract[j][1] + 1))**2
            
            diff_z[0] = (fract[i][2] - (fract[j][2] - 1))**2
            diff_z[1] = (fract[i][2] - (fract[j][2]    ))**2
            diff_z[2] = (fract[i][2] - (fract[j][2] + 1))**2
            
            #get correct shifts
            shx = np.argmin(diff_x) - 1
            shy = np.argmin(diff_y) - 1
            shz = np.argmin(diff_z) - 1
            
            #compute cartesian distance
            dist[i][j] = np.sqrt((xyz[i][0] - (xyz[j][0] + shx * a)) ** 2 + (xyz[i][1] - (xyz[j][1] + shy * b)) ** 2 + (xyz[i][2] - (xyz[j][2] + shz * c)) ** 2)
    
    return dist
talonmies
  • 70,661
  • 34
  • 192
  • 269
Glxblt76
  • 365
  • 5
  • 14
  • Please give provide some (deterministic) input for the code. Also, it would be good to have a clear formula for `dist[i][j]` (instead of trying to extract it from the code). Is there a mathematical interpretation for it? – dan_fulea Nov 30 '21 at 11:35
  • https://stackoverflow.com/questions/52649815/faster-code-to-calculate-distance-between-points-in-numpy-array-with-cyclic-per and https://stackoverflow.com/questions/67035793/how-to-implement-in-python-a-function-to-compute-the-euclidean-distance-between – Willem Hendriks Nov 30 '21 at 12:15

1 Answers1

0

Here is a sketch of a solution based on BallTree

I create random points, 13160

import numpy as np

n=13160 
np.random.seed(1)

points=np.random.uniform(size=(n,3))

Create mirrors/symmetries, e.g.

from itertools import product

def create_symmetries( points ):
    
    symmetries = []
    
    for sym in product([0,-1,1],[0,-1,1],[0,-1,1]):
        new_symmetry = points.copy()
        
        diff_x, diff_y, diff_z = sym
        new_symmetry[:,0] = new_symmetry[:,0] + diff_x
        new_symmetry[:,1] = new_symmetry[:,1] + diff_y
        new_symmetry[:,2] = new_symmetry[:,2] + diff_z

        symmetries.append(new_symmetry)
    
    return symmetries

and create a larger datasets including symmetries;

all_symmetries = np.concatenate( create_symmetries(points) )

To get the closest one, use, k=2 as the closest one is the point itself, and the 2nd closest is whatever symmetry is the closest (including its own, so be careful there)

%%time 
import numpy as np
from sklearn.neighbors import BallTree


tree = BallTree(all_symmetries, leaf_size=15, metric='euclidean')

dist, idx = tree.query(points, k=2, return_distance=True)

This takes < 500ms

CPU times: user 275 ms, sys: 2.77 ms, total: 277 ms
Wall time: 275 ms
Willem Hendriks
  • 1,267
  • 2
  • 9
  • 15
  • Sorry, I don't understand what Neighbours_1[:,1] -1 does exactly. Could you break this down for me? I need the following copies: - cell on left, cell on right - cell on top, cell on bottom - cell on front, cell on back – Glxblt76 Nov 30 '21 at 13:52
  • I have updated with symmetires - but not sure if these are all of them. Idea is you just copy the data and check the closest... – Willem Hendriks Nov 30 '21 at 14:56