3

I have a panda dataframe with the following schema:

customer_id                                     int64
vehicle_type                                   object
pickup_place                                   object
place_category                                 object
how_long_it_took_to_order                      object
pickup_lat                                    float64
pickup_lon                                    float64
dropoff_lat                                   float64
dropoff_lon                                   float64
pickup_coord                                   object
dropoff_coord                                  object
dtype: object

I am trying to find the distance between the pickup and drop locations. So I initially tried using the approach here Getting distance between two points based on latitude/longitude via the haversine formula. When i tried to conver the degrees to radians using

df_post['lat1'] = radians(df_post['pickup_lat'])

I got this error:

TypeError: cannot convert the series to <class 'float'>

So i tried following the approach in the 3rd reply, using the geopy.distance module using the in-built function and for that created a tuple of the lat and long.

df_post['pickup_coord']=list(zip(df_post['pickup_lat'],df_post['pickup_lon']))
df_post['dropoff_coord']=list(zip(df_post['dropoff_lat'],df_post['dropoff_lon'])

But when i tried the in-built function

df_post['pickup_dropoff_distance']=gd.VincentyDistance(df_post['pickup_coord'],df_post['dropoff_coord']).miles

I am getting a new error:

ValueError: When creating a Point from sequence, it must not have more than 3 items.

Can someone help me with why either of the errors are ocurring and what is the possible solution.

jpp
  • 159,742
  • 34
  • 281
  • 339
Raj
  • 1,049
  • 3
  • 16
  • 30

3 Answers3

4

The syntax for your distance calculator is geopy.distance.VincentyDistance(coords_1, coords_2).miles, where coords_1 and coords_2 are tuples.

To apply the function to each row in a dataframe, you need to use pd.DataFrame.apply:

def distancer(row):
    coords_1 = (row['pickup_lat'], row['pickup_long'])
    coords_2 = (row['dropoff_lat'], row['dropoff_long'])
    return geopy.distance.VincentyDistance(coords_1, coords_2).miles

df_post['pickup_dropoff_distance'] = df_post.apply(distancer, axis=1)
jpp
  • 159,742
  • 34
  • 281
  • 339
-1
def distancer(row):
    coords_1 = (row['pickup_lat'], row['pickup_long'])
    coords_2 = (row['dropoff_lat'], row['dropoff_long'])
    return geopy.distance.geodesic(coords_1, coords_2).km
df_distance['pickup_dropoff_distance'] = df_distance.apply(distancer, axis=1)
cottontail
  • 10,268
  • 18
  • 50
  • 51
-2

Try this it should work

df_post['lat1'] = radians(df_post['pickup_lat'].astype(float))
Usama Jamil
  • 68
  • 10