As part of a Data Science course at uni, we were asked to work out the most remote capital city. I'm asking this question here because I'm not happy with my answer, but I wasn't given a better alternative after submission.
As I understand it, the task requires 3 parts:
- Acquire capital city location data
- Create distance function for lat/long pairs
- Use pandas to find the minimum distance from a capital city to any other
The first 2 tasks where trivial. However I struggled to find a way to solve the 3rd task without resorting to iterators. The distance function requires a pair of lat/long values. I need to figure out a way to apply this function to each row, for each other row.
capitals['closest'] = inf
for idx, row_x in capitals.iterrows():
capitals.at[idx,'closest'] = capitals.apply(lambda row_y:
haversine(row_x['lat'],row_x['lng'],row_y['lat'],row_y['lng'])
if row_x['city'] != row_y['city']
else inf
, axis=1).min()
Is there a way to nest calls to the DataFrame apply
method? Is there some other way to create row-wise data that's derived from all other rows?
Edit: Here my final answer, which previously used an iterator (see commit history), but has since been updated with the better solution: https://github.com/maccaroo/worldcities/blob/main/world_cities.ipynb