1

I've got two dataframes, each with a set of coordinates. Dataframe 1 is a list of biomass sites, with coordinates in columns 'lat' and 'lng'. Dataframe 2 is a list of postcode coordinates, linked to sale price, with coordinates in columns 'pc_lat' and 'pc_lng'.

I've used this stackoverflow question to work out the closest biomass site to each property. This is the code I am using:

def dist(lat1, long1, lat2, long2):
return np.abs((lat1-lat2)+(long1-long2))

def find_site(lat, long):
    distances = biomass.apply(
        lambda row: dist(lat, long, row['lat'], row['lng']), 
        axis=1)
    return biomass.loc[distances.idxmin(),'Site Name']

hp1995['BiomassSite'] = hp1995.apply(
    lambda row: find_site(row['pc_lat'], row['pc_long']), 
    axis=1)

print(hp1995.head())

This has worked well, in that I've got the name of the closest Biomass generation site, however I want to know the distance calculated between these two sites.

  1. How would I calculate the distance?

  2. What metric would the output distance be in? I am trying to find properties within 2km from the biomass site.

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
christaylor
  • 361
  • 1
  • 5
  • 14

1 Answers1

3

To calculate distance between two global coordinates you should use the Haversine Formula, based on this page I have implemented the following method:

import math
def distanceBetweenCm(lat1, lon1, lat2, lon2):
    dLat = math.radians(lat2-lat1)
    dLon = math.radians(lon2-lon1)

    lat1 = math.radians(lat1)
    lat2 = math.radians(lat2)

    a = math.sin(dLat/2) * math.sin(dLat/2) + math.sin(dLon/2) * math.sin(dLon/2) * math.cos(lat1) * math.cos(lat2)
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
    return c * 6371 * 100000 #multiply by 100k to get distance in cm

You can also modify it to return different units, by multiplying by different powers of 10. In the example a multiplication by 100k results in units in centimeters. Without multiplying the method returns distance in km. From there you could perform more unit conversions if necessary .

Edit: As suggested in the comments, one possible optimization for this would be using power operators instead of regular multiplication, like this:

a = math.sin(dLat/2)**2 + math.sin(dLon/2)**2 * math.cos(lat1) * math.cos(lat2)

Take a look at this question to read more about different speed complexities of calculating powers in python.

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
  • Glad I could help. If this solved your question you should consider accepting this answer so it is useful for future users with similar questions. – DarkCygnus Jul 04 '17 at 17:00
  • 1
    Full points for knowing haversines. I think you could improve the efficiency by using `a = math.sin(dLat/2)**2 + math.sin(dLon/2)**2 * math.cos(lat1) * math.cos(lat2)` – Shawn Mehan Jul 04 '17 at 17:00
  • @ShawnMehan great suggestion, will edit to add it to the answer – DarkCygnus Jul 04 '17 at 17:02
  • Just one more thing - to get distance in kilometres, I need to multiply it by 20000000000 (100000 * 200000). One example result is '8.441422e+11' – christaylor Jul 04 '17 at 17:06
  • What does that value mean? – christaylor Jul 04 '17 at 17:06
  • Okay, i've just realised that I've flipped this notation completely. – christaylor Jul 04 '17 at 17:10
  • @CTaylor19 to obtain units in Km you should just do `return c * 6371` (as indicated in the answer), that is no multiplication by powers of 10. When multiplying by 100k, as in the example code, you obtain distance in centimeters. – DarkCygnus Jul 04 '17 at 17:10
  • 1
    OK, got it! Thanks – christaylor Jul 04 '17 at 17:11
  • No problem, good luck with your coding. If you found this answer solved your question remember to [accept](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) it so it helps future users. – DarkCygnus Jul 04 '17 at 17:15