0

I want to calculate the haversine distance between two points and make a new pandas series from the result of the computation. My input comes from four different pandas series, namely: pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude.

A function from the haversine module that calculates the distance has the following signature:

p1 = (x1, y1)
p2 = (x2, y2)
haversine(p1, p2): return distance between p1 and p2

I'm curious if there is a fast, pythonic way to do this.

Here's my naive solution:

pulo = train['pickup_longitude'].values
pula = train['pickup_latitude'].values
dolo = train['dropoff_longitude'].values
dola = train['dropoff_latitude'].values

pickup_list = list(zip(pulo,pula))
dropoff_list = list(zip(dolo, dola))

coords = []

for i in range(len(train)):
    coords.append([pickup_list[i], dropoff_list[i]])

haversines = []
for i in range(len(coords)):
    haversines.append(haversine(coords[i][0], coords[i][1]))

train['distance'] = np.asarray(haversines)
SkogensKonung
  • 601
  • 1
  • 9
  • 22
  • Could you indulge us and provide some sample data, so we could copy-n-paste your code and run it? It would save us effort when constructing answers - and ensure that the answers are really useful. – hpaulj Jan 24 '19 at 22:45
  • If you're working on vehicle routing, this structure is not useful. You want a 2D matrix of location IDs and, if necessary, a mapping of location ID to the lat/long in a separate dict. – roganjosh Jan 24 '19 at 22:51
  • The data comes from a kaggle competition, namely https://www.kaggle.com/c/nyc-taxi-trip-duration. I want to compute the distance, because I'm guessing that it might be an important feature for the model. First (relevant data only) row of the data set: pickup_datetime 2016-03-14 17:24:55 dropoff_datetime 2016-03-14 17:32:30 passenger_count 1 pickup_longitude -73.9822 pickup_latitude 40.7679 dropoff_longitude -73.9646 dropoff_latitude 40.7656 trip_duration 455 – SkogensKonung Jan 24 '19 at 22:59
  • Huh, what a strange competition. This should be done in-house and can make use lot of of 3rd party libraries that can do these things accurately by road distance and speed limit e.g. Jsprit (contributor) and OSRM before layering this logic on top. Working on several systems for national delivery companies, looks like they're throwing money away, a decent system takes at least a year, but I guess that's not your issue :P – roganjosh Jan 24 '19 at 23:10
  • This question is specifically about haversine implenetation in pandas: https://stackoverflow.com/questions/25767596/vectorised-haversine-formula-with-a-pandas-dataframe – Plasma Jan 25 '19 at 09:25

0 Answers0