0

I have two DataFrames (df1, df2) with differing sizes, but the same overall columns. Both have time stamps and latitude and longitude points. The time stamps and coordinates are the same for many points because of the frequency at which the data was collected. Here is an example of the DataFrame:

time_local Lat Long
2021-09-08 12:56:32-04:00 37.1455 -85.0555
2021-09-08 12:56:32-04:00 37.1455 -85.0555
2021-09-08 12:56:32-04:00 37.1455 -85.0555
......................... ....... ........

The second DataFrame is the same; however, there are differences in some of the coordinate points throughout. I want to select the points in the first dataframe (df1) closest to the points in the second dataframe (df2); for example, if I had the following coordinate base points of (37.1455, -85.0555) and then (37.1454, -85.0555), (37.1454, -85.0556), (37.1453, -85.0556) then the closest point selected would be (37.1455, -85.0555).

Is there a function within Python that can do this easily enough?

1 Answers1

0

Yes, what we require here is some math. The distance formula for coordinates would help us. Formula:1 enter image description here

Here, x2 represents the second value and x1 the first. Same goes with y.

Putting it in a code (Cartesian Plane):

points = [(37.1454, -85.0555), (37.1454, -85.0556), (37.1453, -85.0556)]
origin = (37.1455, -85.0555)

def distance(cord1,cord2):
    x1, y1 = cord1
    x2, y2 = cord2
    res = ((x2 - x1)**2 + (y2 - y1)**2)**0.5 # Raising to 0.5 is nothing but square root
    return res

def closest_point(origin,points):
    distances = [distance(origin, point) for point in points]
    return points[distances.index(min(distances))] # Fetches the index from points based on smallest value
print(closest_point(origin,points))

For flat surfaces (where only y-coordinate matters):

points = [(37.1454, -85.0555), (37.1454, -85.0556), (37.1453, -85.0556)]
origin = (37.1455, -85.0555)

def closest_point(origin,points):
    distances = [origin[1]-point[1] for point in points]
    return points[distances.index(min(distances))]
print(closest_point(origin,points))
The Myth
  • 1,090
  • 1
  • 4
  • 16
  • This assumes a flat surface – RJ Adriaansen Nov 01 '22 at 15:26
  • Their requirement is `lat` and `long` not 3D figures. So, the only fact into consideration is `x` and `y`. `z` has no role here. – The Myth Nov 01 '22 at 15:26
  • Sorry, I meant cartesian plane. But the issue is that these are geolocations, so haversine might be more apt – RJ Adriaansen Nov 01 '22 at 15:30
  • Being 1 degree off on the equator is vastly different from being one degree off on the poles. Look here for how to implement a distance calculation that accounts for this: [Haversine Formula in Python (Bearing and Distance between two GPS points)](https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points) – the_strange Nov 01 '22 at 15:30
  • Yes, if they have more data. – The Myth Nov 01 '22 at 15:38
  • Using your example worked, but it wouldn't allow me to perform the function using dataframes. Any advice on how to make it work with dataframes? – matrix_season Nov 01 '22 at 20:35