0

I have a data set contains lat/long for two points in four columns and trying to calculate the distance between them in the newly added column using geopy.distance.

It is working fine if I calculate for a single value but doesn't work for the whole column.

import pandas as pd
from geopy import distance

sub_set = main[['Site_1','Site_Longitude_1','Site_Latitude_1','Site_2','Site_Longitude_2','Site_Latitude_2']]

lat1 = sub_set['Site_Latitude_1']
lat2 = sub_set['Site_Latitude_2']
long1 = sub_set['Site_Longitude_1']
long2 = sub_set['Site_Longitude_2']

The data frame sub_set is as follows

  Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2
0      A      -118.645167       34.237917     A2     -118.6499422     34.24973484
1      A      -118.645167       34.237917     A2     -118.6499422     34.24973484
2      B      -118.626659       34.224762     A2     -118.6499422     34.24973484
3      B      -118.626659       34.224762     A2     -118.6499422     34.24973484
4      B      -118.626659       34.224762     A2     -118.6499422     34.24973484

On executing,

sub_set['Distance'] = distance.distance((lat1,long1),(lat2,long2)).miles

the following error message is thrown,

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
Atif
  • 11
  • 1
  • 7
  • Here is the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). – Atif Sep 09 '19 at 20:22
  • please add it directly in the question – PRMoureu Sep 09 '19 at 20:22
  • You need to pass tuples of values, not tuples of Series. My understanding is that you need to loop through the rows for this sort of calculation, unless you're willing to use a vectorized implementation of [Haversine](https://stackoverflow.com/questions/29545704/fast-haversine-approximation-python-pandas) – ALollz Sep 09 '19 at 20:28

2 Answers2

1
  • The following will get you the row-wise calculation you want.
  • The subset stuff is not required
  • This is a long line, but it benefits from an absolute location for the required columns
df['Distance'] = df[['Site_Latitude_1', 'Site_Longitude_1', 'Site_Latitude_2', 'Site_Longitude_2']].apply(lambda x: distance.distance((x[0],x[1]), (x[2],x[3])).miles, axis=1)

Shorter line of code

  • just make certain x[] is properly indexed for the correct column in df
df['Distance'] = df.apply(lambda x: distance.distance((x[2],x[1]), (x[5],x[4])).miles, axis=1)

Output:

  Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2  Distance
0      A      -118.645167       34.237917     A2     -118.6499422     34.24973484  0.859202
1      A      -118.645167       34.237917     A2     -118.6499422     34.24973484  0.859202
2      B      -118.626659       34.224762     A2     -118.6499422     34.24973484  2.177003
3      B      -118.626659       34.224762     A2     -118.6499422     34.24973484  2.177003
4      B      -118.626659       34.224762     A2     -118.6499422     34.24973484  2.177003
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
0

I´m new but I think I can help.

The problem is because you are using series to handle a method that requires single values. You should iterate through rows to select each value individualy.

Try this code:

    for row in sub_set.index:
      site1 =(sub_set.loc[row, 'Site_Latitude_1'],sub_set.loc[row, 'Site_Longitude_1'])
      site2 =(sub_set.loc[row, 'Site_Latitude_2'],sub_set.loc[row, 'Site_Longitude_2'])
      print('Distance is:',(distance.distance(site1, site2).miles),'miles')

Output:

Distance is: 0.8592022243334677 miles
Distance is: 0.8592022243334677 miles
Distance is: 2.1770033222544773 miles
Distance is: 2.1770033222544773 miles
Distance is: 2.1770033222544773 miles

or:

dist =[] 
for row in sub_set.index:
  site1 =(sub_set.loc[row, 'Site_Latitude_1'],sub_set.loc[row, 
  'Site_Longitude_1'])
  site2 =(sub_set.loc[row, 'Site_Latitude_2'],sub_set.loc[row, 
  'Site_Longitude_2'])
  dist.append((distance.distance(site1, site2).miles))
sub_set['Distance'] = dist

Output:

      Site_1 Site_Longitude_1 Site_Latitude_1 Site_2 Site_Longitude_2 Site_Latitude_2   Distance
    0      A    -118.645167    34.237917       A2      -118.649942     34.24973  0.859202
    1      A    -118.645167    34.237917       A2      -118.649942     34.249735 0.859202
    2      B    -118.626659    34.224762       A2      -118.649942     34.249735 2.177003
    3      B    -118.626659    34.224762       A2      -118.649942     34.249735 2.177003
    4      B    -118.626659    34.224762       A2      -118.649942     34.249735 2.177003