I did this by
- calculating the distances (in miles) and time relative to the first row
- My logic
- if n rows are within x minutes/miles of the first row, then those n rows are within x minutes/miles of each other
- filter the data using the required distance and time filter conditions
Generate some dummy data
# Generate random Lat-Long points
def newpoint():
return uniform(-180,180), uniform(-90, 90)
home_points = (newpoint() for x in range(289))
work_points = (newpoint() for x in range(289))
df = pd.DataFrame(home_points, columns=['Home_Longitude', 'Home_Latitude'])
df[['Work_Longitude', 'Work_Latitude']] = pd.DataFrame(work_points)
# Insert `ID` column as sequence of integers
df.insert(0, 'ID', range(289))
# Generate random datetimes, separated by 5 minute intervals
# (you can choose your own interval)
times = pd.date_range('2012-10-01', periods=289, freq='5min')
df.insert(1, 'Created_Time', times)
print(df.head())
ID Created_Time Home_Longitude Home_Latitude Work_Longitude Work_Latitude
0 0 2012-10-01 00:00:00 -48.885981 -39.412351 -68.756244 24.739860
1 1 2012-10-01 00:05:00 58.584893 59.851739 -119.978429 -87.687858
2 2 2012-10-01 00:10:00 -18.623484 85.435248 -14.204142 -3.693993
3 3 2012-10-01 00:15:00 -29.721788 71.671103 -69.833253 -12.446204
4 4 2012-10-01 00:20:00 168.257968 -13.247833 60.979050 -18.393925
Create Python helper function with haversine distance formula (vectorized haversine distance formula, in km)
def haversine(lat1, lon1, lat2, lon2, to_radians=False, earth_radius=6371):
"""
slightly modified version: of http://stackoverflow.com/a/29546836/2901002
Calculate the great circle distance between two points
on the earth (specified in decimal degrees or in radians)
All (lat, lon) coordinates must have numeric dtypes and be of equal length.
"""
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return earth_radius * 2 * np.arcsin(np.sqrt(a))
Calculate distances (relative to first row) in km, using haversine formula. Then, convert km to miles
df['Home_dist_miles'] = \
haversine(df.Home_Longitude, df.Home_Latitude,
df.loc[0, 'Home_Longitude'], df.loc[0, 'Home_Latitude'])*0.621371
df['Work_dist_miles'] = \
haversine(df.Work_Longitude, df.Work_Latitude,
df.loc[0, 'Work_Longitude'], df.loc[0, 'Work_Latitude'])*0.621371
Calculate time differences, in minutes (relative to first row)
- for the dummy data here, the time differences will be in multiples of 5 minutes (but in real data, they could be anything)
df['time'] = df['Created_Time'] - df.loc[0, 'Created_Time']
df['time_min'] = (df['time'].dt.days * 24 * 60 * 60 + df['time'].dt.seconds)/60
Apply filters (method 1) and then select any 2 rows that satisfy the conditions stated in the OP
home_filter = df['Home_dist_miles']<=12000 # within 12,000 miles
work_filter = df['Work_dist_miles']<=8000 # within 8,000 miles
time_filter = df['time_min']<=25 # within 25 minutes
df_filtered = df.loc[(home_filter) & (work_filter) & (time_filter)]
# Select any 2 rows that satisfy required conditions
df_any2rows = df_filtered.sample(n=2)
print(df_any2rows)
ID Created_Time Home_Longitude Home_Latitude Work_Longitude Work_Latitude Home_dist_miles Work_dist_miles time time_min
0 0 2012-10-01 00:00:00 -168.956448 -42.970705 -6.340945 -12.749469 0.000000 0.000000 00:00:00 0.0
4 4 2012-10-01 00:20:00 -73.120352 13.748187 -36.953587 23.528789 6259.078588 5939.425019 00:20:00 20.0
Apply filters (method 2) and then select any 2 rows that satisfy the conditions stated in the OP
multi_query = """Home_dist_miles<=12000 & \
Work_dist_miles<=8000 & \
time_min<=25"""
df_filtered = df.query(multi_query)
# Select any 2 rows that satisfy required conditions
df_any2rows = df_filtered.sample(n=2)
print(df_any2rows)
ID Created_Time Home_Longitude Home_Latitude Work_Longitude Work_Latitude Home_dist_miles Work_dist_miles time time_min
0 0 2012-10-01 00:00:00 -168.956448 -42.970705 -6.340945 -12.749469 0.000000 0.000000 00:00:00 0.0
4 4 2012-10-01 00:20:00 -73.120352 13.748187 -36.953587 23.528789 6259.078588 5939.425019 00:20:00 20.0