I am trying to find pairs of (x,y) points within a maximum distance of each other. I thought the simplest thing to do would be to generate a DataFrame and go through each point, one by one, calculating if there are points with coordinates (x,y) within distance r of the given point (x_0, y_0). Then, divide the total number of discovered pairs by 2.
%pylab inline
import pandas as pd
def find_nbrs(low, high, num, max_d):
x = random.uniform(low, high, num)
y = random.uniform(low, high, num)
points = pd.DataFrame({'x':x, 'y':y})
tot_nbrs = 0
for i in arange(len(points)):
x_0 = points.x[i]
y_0 = points.y[i]
pt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
tot_nbrs += len(pt_nbrz)
plot (pt_nbrz.x, pt_nbrz.y, 'r-')
plot (points.x, points.y, 'b.')
return tot_nbrs
print find_nbrs(0, 1, 50, 0.1)
First of all, it's not always finding the right pairs (I see points that are within the stated distance that are not labeled).
If I write
plot(..., 'or')
, it highlights all the points. Which means thatpt_nbrz = points[((x_0 - points.x)**2 + (y_0 - points.y)**2) < max_d**2]
returns at least one (x,y). Why? Shouldn't it return an empty array if the comparison is False?How do I do all of the above more elegantly in Pandas? For example, without having to loop through each element.