0

I have two different catalogues (data_in and data_out) with x,y coordinates of many points. As shown below I need to find all data_in points that are near to data_out points, in particular all data_in points within a circle of radius r_search, centered on every data_out points.

This little script works fine, but it is very slow. There is a way to speed up the process?

import numpy as np

data_in = np.genfromtxt(file_in)
x_in = np.array(data_in[:,1])
y_in = np.array(data_in[:,2])

data_out = np.genfromtxt(file_out)
x_out = np.array(data_out[:,1])
y_out = np.array(data_out[:,2])

r_search = 5

a=0
for i in range(len(x_out)):
    for j in range(len(x_in)):
                delta_x = abs(x_in[j] - x_out[i])
                delta_y = abs(y_in[j] - y_out[i])
                r_tmp = np.sqrt(np.power(delta_x,2) + np.power(delta_y,2))
                if (r_tmp <= r_search):
                    a=a+1

X = np.zeros(a)
Y = np.zeros(a)
a=0
for i in range(len(x_out)):
    for j in range(len(x_in)):
                delta_x = abs(x_in[j] - x_out[i])
                delta_y = abs(y_in[j] - y_out[i])
                r_tmp = np.sqrt(np.power(delta_x,2) + np.power(delta_y,2))
                if (r_tmp <= r_search):
                    X[a] = x_in[j]
                    Y[a] = y_in[j]
                    a=a+1
Alessandro Peca
  • 873
  • 1
  • 15
  • 40

2 Answers2

1

The fastest algorithm for the application you describe is a k-dimensional tree, in particular, a 2-dimensional tree, better known as quad-tree. The way it works is by partitioning the big array you have into smaller arrays containing groups of close points.

Implementing it yourself is possible but not recommended since there are libraries that are way more optimised than you could do yourself. There were some discussion on SO some years ago on which is the best. But now it is acceptable to say that this library is the best.

tripleee
  • 175,061
  • 34
  • 275
  • 318
Benoît P
  • 3,179
  • 13
  • 31
1

One possible optimisation is to remove the (expensive) square root from the distance calculation and compare with the squared search radius, so this:

r_tmp = np.sqrt(np.power(delta_x,2) + np.power(delta_y,2))
if (r_tmp <= r_search):

becomes:

r_tmp = np.power(delta_x,2) + np.power(delta_y,2)
if (r_tmp <= r_search*r_search):
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432