0

As part of a Data Science course at uni, we were asked to work out the most remote capital city. I'm asking this question here because I'm not happy with my answer, but I wasn't given a better alternative after submission.

As I understand it, the task requires 3 parts:

  1. Acquire capital city location data
  2. Create distance function for lat/long pairs
  3. Use pandas to find the minimum distance from a capital city to any other

The first 2 tasks where trivial. However I struggled to find a way to solve the 3rd task without resorting to iterators. The distance function requires a pair of lat/long values. I need to figure out a way to apply this function to each row, for each other row.

capitals['closest'] = inf
for idx, row_x in capitals.iterrows():
    capitals.at[idx,'closest'] = capitals.apply(lambda row_y: 
                                 haversine(row_x['lat'],row_x['lng'],row_y['lat'],row_y['lng'])
                                 if row_x['city'] != row_y['city']
                                 else inf
                                , axis=1).min()

Is there a way to nest calls to the DataFrame apply method? Is there some other way to create row-wise data that's derived from all other rows?

Edit: Here my final answer, which previously used an iterator (see commit history), but has since been updated with the better solution: https://github.com/maccaroo/worldcities/blob/main/world_cities.ipynb

maccaroo
  • 819
  • 2
  • 12
  • 22

1 Answers1

0

I found the solution in the 'Similar questions' search as I was about to post, but I feel my answer is different enough to warrant a post.

First, here's the post which (mostly) answered by question. However I kept getting this error: KeyError: ('city', 'occurred at index city', 'occurred at index city')

This article got me over the line. The solution to that was the axis=1 arguments, which tell apply to use the column rather than the row index when processing the data.

Here's my final code:

capitals['closest'] = inf
capitals['closest'] = capitals.apply(lambda row:
    capitals.apply(lambda x: 
                   haversine(row['lat'],row['lng'],x['lat'],x['lng']) 
                   if row['city'] != x['city'] 
                   else inf
              ,axis=1)
    ,axis=1).min()
maccaroo
  • 819
  • 2
  • 12
  • 22