First, let me start by saying that it appears to me that your longitudes and latitudes are locations on Earth. Assuming that Earth is a sphere, the distance between two points should be computed as the length along great-circle distance and not as Euclidean distance that you get using cdist
.
The easiest approach from the programming point of view (except for the learning curve for you) is to use the astropy
package. They have quite an OK documentation sometimes with useful examples, see, e.g., match_coordinates_sky()
or catalog matching with astropy.
Then you might do something like this:
>>> from astropy.units import Quantity
>>> from astropy.coordinates import match_coordinates_sky, SkyCoord, EarthLocation
>>> from pandas import DataFrame
>>> import numpy as np
>>>
>>> # Create your data as I understood it:
>>> all_points = DataFrame({'point_id': np.arange(1,6), 'latitude': [41.894577, 41.894647, 41.894713, 41.894768, 41.894830], 'longitude': [-87.645307, -87.640426, -87.635513, -87.630629, -87.625793 ]})
>>> parent_pts = DataFrame({'parent_id': [1, 2]})
>>>
>>> # Create a frame with the coordinates of the "parent" points:
>>> parent_coord = all_points.loc[all_points['point_id'].isin(parent_pts['parent_id'])]
>>> print(parent_coord)
latitude longitude point_id
0 41.894577 -87.645307 1
1 41.894647 -87.640426 2
>>>
>>> # Create coordinate array for "points" (in principle the below statements
>>> # could be combined into a single one):
>>> all_lon = Quantity(all_points['longitude'], unit='deg')
>>> all_lat = Quantity(all_points['latitude'], unit='deg')
>>> all_pts = SkyCoord(EarthLocation.from_geodetic(all_lon, all_lat).itrs, frame='itrs')
>>>
>>> # Create coordinate array for "parent points":
>>> parent_lon = Quantity(parent_coord['longitude'], unit='deg')
>>> parent_lat = Quantity(parent_coord['latitude'], unit='deg')
>>> parent_catalog = SkyCoord(EarthLocation.from_geodetic(parent_lon, parent_lat).itrs, frame='itrs')
>>>
>>> # Get the indices (in parent_catalog) of parent coordinates
>>> # closest to each point:
>>> matched_indices = match_coordinates_sky(all_pts, parent_catalog)[0]
Downloading http://maia.usno.navy.mil/ser7/finals2000A.all
|========================================================================| 3.1M/3.1M (100.00%) 0s
>>> all_points['parent_id'] = [parent_pts['parent_id'][idx] for idx in matched_indices]
>>> print(all_points)
latitude longitude point_id parent_id
0 41.894577 -87.645307 1 1
1 41.894647 -87.640426 2 2
2 41.894713 -87.635513 3 2
3 41.894768 -87.630629 4 2
4 41.894830 -87.625793 5 2
I would like to add that match_coordinates_sky()
returns not only matching indices but also a list of angular separations between the data point and the matched "parent" point as well as distance in meters between the data points and the matched "parent" point. It may be useful for your problem.