I have a df
with flight times from A -> B/C and flights from B/C -> Z.
I'd like to find possible one-stop routings between A->Z. These would require a stop at B/C.
A simple merge does the trick.
routes = pd.merge(df , df , left_on = 'Destination' , right_on = 'Origin')
However, my schedule df has thousands of flights from A -> B/C/D/.../Y and thousands of flights from B/C/D/.../Y->Z. Merging this table with itself results in a routes
table that is billions of rows long. I can filter to a small df by filtering to routings with a layover of <24 hours.
routes = routes[routes['Time_y'] - routes['Time_x'] < 24]
To first compute all possible routings and then filter to those with a layover < 24 hours is computationally infeasible. Is there any way to perform the pd.merge
simultaneously with the condition that layover be < 24 hours? The pseudocode would be:
routes = pd.merge(df , df , left_on = 'Destination' , right_on = 'Origin' , right['Time'] - left['Time'] < 24)
Here's sample data:
df = pd.DataFrame({'Origin': {1: 'A', 6: 'A', 11: 'A', 16: 'A', 21: 'B', 26: 'B', 31: 'C', 36: 'C'}, 'Destination': {1: 'B', 6: 'B', 11: 'C', 16: 'C', 21: 'Z', 26: 'Z', 31: 'Z', 36: 'Z'}, 'Dept_Time': {1: pd.Timestamp('2019-03-30 17:31:00'), 6: pd.Timestamp('2019-05-16 17:32:00'), 11: pd.Timestamp('2019-04-01 08:30:00'), 16: pd.Timestamp('2019-06-09 08:20:00'), 21: pd.Timestamp('2019-07-26 08:31:00'), 26: pd.Timestamp('2019-03-31 06:16:00'), 31: pd.Timestamp('2019-07-03 23:52:00'), 36: pd.Timestamp('2019-03-27 17:31:00')}, 'Arrv_Time': {1: pd.Timestamp('2019-03-30 23:23:00'), 6: pd.Timestamp('2019-05-16 23:22:00'), 11: pd.Timestamp('2019-04-01 14:22:00'), 16: pd.Timestamp('2019-06-09 14:18:00'), 21: pd.Timestamp('2019-07-26 14:23:00'), 26: pd.Timestamp('2019-06-18 05:00:00'), 31: pd.Timestamp('2019-07-04 05:36:00'), 36: pd.Timestamp('2019-03-27 23:23:00')}})