I am facing a problem for computing a large distance matrix. However, this is a specific distance matrix: it is a matrix of points which are in a unit cell. This function gets fractional coordinates (between 0 and 1 in all dimensions) and I would like to calculate the distance matrix accounting for the fact that there is an identical copy of the point in each neighbor of the unit cell, and therefore the correct distance may be with the copy rather than with the other point within the unit cell.
Do you know if anything can be done with scipy or numpy pre-coded C libraries for this? I've done a numba code which works but runs rather slowly. Here I have a list of 13160 points for which I want to calculate a 13160*13160 distance matrix, ie that contains 173185600 elements.
The principle is: for each coordinate, calculate the square fractional distance of the first point with the second point either within the cell, or in one of its two neigbhors (behind and in front). Then get the minimum of the square distance for each coordinate and get the corresponding Euclidian distance from the cartesian coordinates.
The time it currently takes is: 40.82661843299866 seconds
Do you know if I can make it run faster by any means, or is my dataset just large and there is nothing more to be done?
Below is the code:
def getDistInCell(fract, xyz, n_sg, a, b, c): #calculates the distance matrix accounting for symmetry
dist = np.zeros((n_sg, n_sg))
for i in range(n_sg):
for j in range(n_sg):
#we evaluate the closest segment according to translation to neighbouring cells
diff_x = np.zeros((3))
diff_y = np.zeros((3))
diff_z = np.zeros((3))
diff_x[0] = (fract[i][0] - (fract[j][0] - 1))**2
diff_x[1] = (fract[i][0] - (fract[j][0] ))**2
diff_x[2] = (fract[i][0] - (fract[j][0] + 1))**2
diff_y[0] = (fract[i][1] - (fract[j][1] - 1))**2
diff_y[1] = (fract[i][1] - (fract[j][1] ))**2
diff_y[2] = (fract[i][1] - (fract[j][1] + 1))**2
diff_z[0] = (fract[i][2] - (fract[j][2] - 1))**2
diff_z[1] = (fract[i][2] - (fract[j][2] ))**2
diff_z[2] = (fract[i][2] - (fract[j][2] + 1))**2
#get correct shifts
shx = np.argmin(diff_x) - 1
shy = np.argmin(diff_y) - 1
shz = np.argmin(diff_z) - 1
#compute cartesian distance
dist[i][j] = np.sqrt((xyz[i][0] - (xyz[j][0] + shx * a)) ** 2 + (xyz[i][1] - (xyz[j][1] + shy * b)) ** 2 + (xyz[i][2] - (xyz[j][2] + shz * c)) ** 2)
return dist