import numpy as np
import pandas as pd
from sklearn.neighbors import BallTree
Sample Data
df = pd.DataFrame({'id':list('abcde'),'latitude': [38.470628, 37.994155, 38.66937, 34.119578, 36.292307],'longitude': [-121.404586, -121.802341, -121.295325, -117.413791, -119.804074]}) #sample
Extract lat,long and convert to radians. Calculate the needed radius when converted to unit sphere.
coords = df[["latitude","longitude"]]
distance_in_miles = 50
earth_radius_in_miles = 3958.8
radius = distance_in_miles / earth_radius_in_miles
tree = BallTree( np.radians(coords), leaf_size=10, metric='haversine')
tree.query_radius( np.radians(coords), r=radius, count_only=True)
Which gives array([3, 2, 2, 1, 1])
If you want to return the indici and use them for aggregates; one way is to
df = pd.DataFrame({'id':list('abcde'),'latitude': [38.470628, 37.994155, 38.66937, 34.119578, 36.292307],'longitude': [-121.404586, -121.802341, -121.295325, -117.413791, -119.804074], 'saleprice_usd_per_sqf': [200, 300, 700, 350, 50]})
coords = df[["latitude","longitude"]]
distance_in_miles = 50
earth_radius_in_miles = 3958.8
radius = distance_in_miles / earth_radius_in_miles
Note we use indici here and not only count;
tree = BallTree( np.radians(coords), leaf_size=10, metric='haversine')
indici = tree.query_radius( np.radians(coords), r=radius, count_only=False)
And use list comprehension to for instance get the median value for each radius. Be aware the the point itself is always included in its own radius.
[np.median(df.saleprice_usd_per_sqf.values[idx]) for idx in indici]