I want calculate the distance from each building in df1 to each city in df2. The issue is that I have ~30 rows in df1 and ~30,000 rows in df2.
The desired output is set up in output_df.
How do I go about this? Is it possible using Geopy or would that take too long?
df1 = pd.DataFrame({'Building': ['One World Trade Center', 'Central Park Tower','Willis Tower', '111 West 57th Street', 'One Vanderbilt'],
'Latitude': [40.713005, 40.765957, 41.878872, 40.764760, 40.752971],
'Longitude': [-74.013190, -73.980844, -87.635908, -73.977581, -73.978541],
})
df2 = pd.DataFrame({'City': ['Santa Barbra, CA', 'Washington, D.C.'],
'Latitude': [34.024212, 38.9072],
'Longitude': [-118.496475, -77.0369],
})
output_df = pd.DataFrame({'City': ['Santa Barbra', 'Santa Barbra', 'Santa Barbra', 'Santa Barbra', 'Santa Barbra', 'Washington D.C.', 'Washington D.C.', 'Washington D.C.', 'Washington D.C.', 'Washington D.C.'],
'Building': ['One World Trade Center', 'Central Park Tower', 'Willis Tower', '111 West 57th Street', 'One Vanderbilt', 'One World Trade Center', 'Central Park Tower', 'Willis Tower', '111 West 57th Street', 'One Vanderbilt'],
'Latitude': [40.713005, 40.765957, 41.878872, 40.764760, 40.752971, 40.713005, 40.765957, 41.878872, 40.764760, 40.752971],
'Longitude': [-74.013190, -73.980844, -87.635908, -73.977581, -73.978541, -74.013190, -73.980844, -87.635908, -73.977581, -73.978541],
'Distance': ['dis_to_SB', 'dis_to_SB', 'dis_to_SB', 'dis_to_SB', 'dis_to_SB', 'dis_to_DC', 'dis_to_DC', 'dis_to_DC', 'dis_to_DC', 'dis_to_DC']})
output_df.set_index(['City', 'Building'])