3

I have a dataframe that contains the origin destination trips of users between different points (latitude/longitude). So we have Origin_X, Origin_Y and Destination_X, Destination_Y

df:

Trip Origin_X  Origin_Y  Destination_X Destination_Y
1   -33.55682 -70.78614   -33.44007     -70.6552
2   -33.49097 -70.77741   -33.48908     -70.76263
3   -33.37108 -70.6711    -33.73425     -70.76278

I want to group together all the Trip that have are in a radius of 1km both at the origin and destination. Two trips can be grouped if the their distance ad the origin and their distance at destination is d<=1km. In order to compute the distance between two coordinates I am using the haversine function.

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r
emax
  • 6,965
  • 19
  • 74
  • 141
  • Please check this question for a vectorised method to calculate haversine you can add this as a new distance column and then bucket/filter the df: http://stackoverflow.com/questions/25767596/using-haversine-formula-with-data-stored-in-a-pandas-dataframe – EdChum Apr 29 '16 at 10:22

2 Answers2

3

Here is how you can do it:

import pandas as pd
from math import *

def haversine(row):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1 = row[1]
    lat1 = row[2]
    lon2 = row[3]
    lat2 = row[4]
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles
    return c * r

#Copy the trip details provided by in this question

df = pd.read_clipboard()
df['dist'] = df.apply(haversine, axis=1)

print df

   Trip  Origin_X  Origin_Y  Destination_X  Destination_Y       dist
0     1 -33.55682 -70.78614      -33.44007      -70.65520  15.177680
1     2 -33.49097 -70.77741      -33.48908      -70.76263   1.644918
2     3 -33.37108 -70.67110      -33.73425      -70.76278  16.785898
#To group
dfg = df.groupby(df['dist'] < 1)

#Just to select all the trips that are less than 2 radius
df[df['dist']<2]
   Trip  Origin_X  Origin_Y  Destination_X  Destination_Y      dist
1     2 -33.49097 -70.77741      -33.48908      -70.76263  1.644918
Abbas
  • 3,872
  • 6
  • 36
  • 63
0

You could iterate over each point, calculate the distance to all the other points and then check if the distance is below or equal to 1km and add it to a dictionary where the key is the origin point and the value an array of all the close points...

Emma
  • 430
  • 8
  • 14