0

I currently have a list of coordinates

[(52.14847612092221, 0.33689512047881015),
 (52.14847612092221, 0.33689512047881015),
 (52.95756796776235, 0.38027099942700493),
 (51.78723479900971, -1.4214854900618064)
 ...]

I would like to split this list into 3 separate lists/datafames corresponding to which city they are closest to (in this case the coordinates are all in the UK and the 3 cities are Manchester, Cardiff and London)

So at the end result I would like the current single list of coordinates to be split into either separate lists ideally or it could be a dataframe with 3 columns would be fine eg:

 leeds                   cardiff                 london
(51.78723479900971,    (51.78723479900971,      (51.78723479900971,
 -1.4214854900618064)    -1.4214854900618064)    -1.4214854900618064) 

(those are obiously not correct coordinates!)

-Hope that makes sense. It doesn't have to be overly accurate (don't need to take into consideration the curvature of the earth or anything like that!)

I'm really not sure where to start with this - I'm very new to python and would appreciate any help! Thanks in advance

Thomas K
  • 39,200
  • 7
  • 84
  • 86
hsquared
  • 349
  • 2
  • 4
  • 17
  • Can you please show what you expect the output to look like? – Tom Pitts Sep 09 '16 at 23:06
  • I have added more detail, but I would like the output to be 3 lists(or this could be columns in a dataframe - which ever is easier), with the data in each list being the coordinates closest to the city (manchester, london, cardiff) – hsquared Sep 09 '16 at 23:16
  • @hsquared what have you done so far? have you searched SO, or Google? [for example](http://stackoverflow.com/questions/8858838/need-help-calculating-geographical-distance) – ivan7707 Sep 09 '16 at 23:25
  • Yes I have looked at that but couldn't find another example that I could get working for my situation. eg that example you posted doesn't take the coordinates from one list and save the coordinates in to different lists as a result depending on which coordinate it is closer to. I also dont need to take into account the curvature of the earth. – hsquared Sep 09 '16 at 23:35
  • This is a perfect job for KD-trees, especially if the number of coordinates is large: see scipy.spatial.cKDTree. It takes a bit of reading and hunting for examples on SO, but a really useful solution for fast nearest neighbour lookups. – Benjamin Sep 09 '16 at 23:47

2 Answers2

0

This will get you started:

from geopy.geocoders import Nominatim
geolocator = Nominatim()

places = ['london','cardiff','leeds']
coordinates = {}
for i in places: 
    coordinates[i] = ((geolocator.geocode(i).latitude, geolocator.geocode(i).longitude))

>>>print coordinates
{'cardiff': (51.4816546, -3.1791933), 'leeds': (53.7974185, -1.543794), 'london': (51.5073219, -0.1276473)}

You can now hook up the architecture for putting this in a pandas dataframe, calculating the distance metric between your coordinates and the above.

Ok so now we want to do distances between what is a very small array (the coordinates).

Here's some code:

import numpy as np
single_point = [3, 4] # A coordinate
points = np.arange(20).reshape((10,2)) # Lots of other coordinates

dist = (points - single_point)**2
dist = np.sum(dist, axis=1)
dist = np.sqrt(dist)

From here there is any number of things you can do. You can sort it using numpy, or you can place it in a pandas dataframe and sort it there (though that's really just a wrapper for the numpy function I believe). Whichever you're more comfortable with.

Astrid
  • 1,846
  • 4
  • 26
  • 48
  • Thanks for that it's helped so much! You're right though I am not sure how to compare and place in different lists depending on which is closer. Are you able to help with that too? Thanks in advance! – hsquared Sep 09 '16 at 23:45
  • Thanks again for your help, but sorry for being dumb but how to I use both bits of code together? my current coordinates are in a list so will I have to loop through them to compare the distance, then put them in a specific list depending on the outcome? – hsquared Sep 10 '16 at 00:05
0

This is a pretty brute force approach, and not too adaptable. However, that can be the easiest to understand and might be plenty efficient for the problem at hand. It also uses only pure python, which may help you to understand some of python's conventions.

points = [(52.14847612092221, 0.33689512047881015), (52.14847612092221, 0.33689512047881015), (52.95756796776235, 0.38027099942700493), (51.78723479900971, -1.4214854900618064), ...]

cardiff = (51.4816546, -3.1791933)
leeds = (53.7974185, -1.543794)
london = (51.5073219, -0.1276473)

def distance(pt, city):
    return ((pt[0] - city[0])**2 + (pt[1] - city[1])**2)**0.5

cardiff_pts = []
leeds_pts = []
london_pts = []
undefined_pts = []  # for points equidistant between two/three cities

for pt in points:
    d_cardiff = distance(pt, cardiff)
    d_leeds = distance(pt, leeds)
    d_london = distance(pt, london)
    if (d_cardiff < d_leeds) and (d_cardiff < d_london):
        cardiff_pts.append(pt)
    elif (d_leeds < d_cardiff) and (d_leeds < d_london):
        leeds_pts.append(pt)
    elif (d_london < d_cardiff) and (d_london < d_leeds):
        london_pts.append(pt)
    else:
        undefined_pts.append(pt)

Note that this solution assumes the values are on a cartesian reference frame, which latitude longitude pairs are not.

Logan Byers
  • 1,454
  • 12
  • 19