I have N
dataframes where two of the columns hold longitude and latitude data, tracking the motion of a car. The general trace of the car is the same for all dataframes but since the tracking sometimes starts a bit late, or end a bit early, the length of the dataframes differ.
I want the dataframes to "line up", i.e. trim the rows which corresponds to "non overlapping positional data". I expect the result to be N
dataframes of equal length s.t. the positional data is the same for all dataframes.
Example
Three arbitrary dataframes look like this:
time speed longitude latitude
t00 v00 19.70 48.67
t01 v01 19.71 48.65
t02 v02 19.72 48.64
t03 v03 19.73 48.64
t04 v04 19.74 48.63
t05 v05 19.74 48.63
t06 v06 19.75 48.64
t07 v07 19.75 48.64
t08 v08 19.75 48.64
t09 v09 19.75 48.64
time speed longitude latitude
t10 v10 19.72 48.64
t11 v11 19.73 48.64
t12 v12 19.74 48.63
t13 v13 19.74 48.63
t14 v14 19.75 48.64
t15 v15 19.75 48.64
t16 v16 19.75 48.64
time speed longitude latitude
t20 v20 19.72 48.64
t21 v21 19.73 48.64
t22 v22 19.74 48.63
t23 v23 19.74 48.63
t24 v24 19.75 48.64
t25 v25 19.75 48.63
t26 v26 19.75 48.64
t27 v27 19.75 48.64
t28 v28 19.75 48.64
The result should be three new data frames:
time speed longitude latitude
t02 v02 19.72 48.64
t03 v03 19.73 48.64
t04 v04 19.74 48.63
t05 v05 19.74 48.63
t06 v06 19.75 48.64
time speed longitude latitude
t10 v10 19.72 48.64
t11 v11 19.73 48.64
t12 v12 19.74 48.63
t13 v13 19.74 48.63
t14 v14 19.75 48.64
time speed longitude latitude
t20 v20 19.72 48.64
t21 v21 19.73 48.64
t22 v22 19.74 48.63
t23 v23 19.74 48.63
t24 v24 19.75 48.64
In reality, the number of overlaying coordinates will be higher, but I hope this shows the gist of it.
I found this post where the intersection between two lists is retrieved. I tried to extract the positional data from the dataframe and then extract only the rows with matching coordinates from all dataframes, but this fails due to the number of rows differing between the dataframes.
My current code is seen below:
first_route = True
for route in routes: # extract all route's coordinates
lon = route["longitude"].values.tolist()
lat = route["latitude"].values.tolist()
if first_route: # add first route regardless
cropped_lon = lon
cropped_lat = lat
first_route = False
continue
old_lon = collections.Counter(cropped_lon)
old_lat = collections.Counter(cropped_lat)
new_lon = collections.Counter(lon)
new_lat = collections.Counter(lat)
cropped_lon = list((old_lon & new_lon).elements())
cropped_lat = list((old_lat & new_lat).elements())
cropped_lon = np.asarray(cropped_lon)
cropped_lat = np.asarray(cropped_lat)
# THIS fails due to length difference
# Here I want to extract all rows which satisfy the positional restrictions
for route in routes:
print(route[route.longitude == cropped_lon and route.latitude == cropped_lat])
I am totally willing to discard my whole idea if anyone has a better idea.
Update
The accepted answer solves the problem in the title, but I am looking for an extended solution. I hope it can be achieved in a similar way, why I leave it as an update.
My actual coordinate data has much higher resolution (6 decimals) but the measurements are not exact enough. The result is that the code in the accepted answer yields empty dataframes. I could take the shortest dataframe and then "slide" all other dataframes over it in order to do a least square fit, but I'm hoping there's a solution more similar to the one below.