How to speed up a searching algorithm

Question

I have two different catalogues (data_in and data_out) with x,y coordinates of many points. As shown below I need to find all data_in points that are near to data_out points, in particular all data_in points within a circle of radius r_search, centered on every data_out points.

This little script works fine, but it is very slow. There is a way to speed up the process?

import numpy as np

data_in = np.genfromtxt(file_in)
x_in = np.array(data_in[:,1])
y_in = np.array(data_in[:,2])

data_out = np.genfromtxt(file_out)
x_out = np.array(data_out[:,1])
y_out = np.array(data_out[:,2])

r_search = 5

a=0
for i in range(len(x_out)):
    for j in range(len(x_in)):
                delta_x = abs(x_in[j] - x_out[i])
                delta_y = abs(y_in[j] - y_out[i])
                r_tmp = np.sqrt(np.power(delta_x,2) + np.power(delta_y,2))
                if (r_tmp <= r_search):
                    a=a+1

X = np.zeros(a)
Y = np.zeros(a)
a=0
for i in range(len(x_out)):
    for j in range(len(x_in)):
                delta_x = abs(x_in[j] - x_out[i])
                delta_y = abs(y_in[j] - y_out[i])
                r_tmp = np.sqrt(np.power(delta_x,2) + np.power(delta_y,2))
                if (r_tmp <= r_search):
                    X[a] = x_in[j]
                    Y[a] = y_in[j]
                    a=a+1

This sounds like a job for a [KD-tree](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.html). — user2357112, Jan 30 '19 at 18:52
How about a [quad tree](https://www.i-programmer.info/programming/theory/1679-quadtrees-and-octrees.html) — Felipe Gutierrez, Jan 30 '19 at 18:59

score 1 · Answer 1 · edited Jan 31 '19 at 12:36

1

The fastest algorithm for the application you describe is a k-dimensional tree, in particular, a 2-dimensional tree, better known as quad-tree. The way it works is by partitioning the big array you have into smaller arrays containing groups of close points.

Implementing it yourself is possible but not recommended since there are libraries that are way more optimised than you could do yourself. There were some discussion on SO some years ago on which is the best. But now it is acceptable to say that this library is the best.

edited Jan 31 '19 at 12:36

tripleee

175,061
34
275
318

answered Jan 30 '19 at 19:10

Benoît P

3,179
13
31

`scipy.spatial.KDTree.query_ball_point` in particular seems very good! – Alessandro Peca Jan 31 '19 at 10:08

score 1 · Answer 2 · answered Jan 30 '19 at 20:12

One possible optimisation is to remove the (expensive) square root from the distance calculation and compare with the squared search radius, so this:

r_tmp = np.sqrt(np.power(delta_x,2) + np.power(delta_y,2))
if (r_tmp <= r_search):

becomes:

r_tmp = np.power(delta_x,2) + np.power(delta_y,2)
if (r_tmp <= r_search*r_search):

How to speed up a searching algorithm

2 Answers2