1

I have the following data :

Trip      Start_Lat   Start_Long    End_lat      End_Long    Starting_point    Ending_point
Trip_1    56.5624     -85.56845       58.568       45.568         A               B
Trip_1    58.568       45.568       -200.568     -290.568         B               C 
Trip_1   -200.568     -290.568       56.5624     -85.56845        C               D
Trip_2    56.5624     -85.56845     -85.56845    -200.568         A               B
Trip_2   -85.56845    -200.568      -150.568     -190.568         B               C

I would like to find the circuitry which is

   Circuity = Total Distance Travelled(Trip A+B+C+D) - Straight line (Trip A to D)
              -----------------------------------------------------------------------
                       Total Distance Traveled (Trip A+B+C+D)

I tried the following code,

    df['Distance']= df['flight_distance'] = df.apply(lambda x: great_circle((x['start_lat'], x['start_long']), (x['end_lat'], x['end_long'])).km, axis = 1) 
    df['Total_Distance'] = ((df.groupby('Trip')['distance'].shift(2) +['distance'].shift(1) + df['distance']).abs())

Could you help me to find the straight line distance and circuitry?

1 Answers1

0

UPDATE:

you may want to convert your values to numeric dtypes first:

df[['Start_Lat','Start_Long','End_lat','End_Long']] = \
df[['Start_Lat','Start_Long','End_lat','End_Long']].apply(pd.to_numeric, errors='coerce')

IIUC you can do it this way:

# vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
    """
    slightly modified version: of http://stackoverflow.com/a/29546836/2901002

    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees or in radians)

    All (lat, lon) coordinates must have numeric dtypes and be of equal length.

    """
    if to_radians:
        lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])

    a = np.sin((lat2-lat1)/2.0)**2 + \
        np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2

    return earth_radius * 2 * np.arcsin(np.sqrt(a))

def f(df):
    return 1 - haversine(df.iloc[0, 1], df.iloc[0, 2],
                         df.iloc[-1, 3], df.iloc[-1, 4]) \
               / \
               haversine(df['Start_Lat'], df['Start_Long'],
                         df['End_lat'], df['End_Long']).sum()

df.groupby('Trip').apply(f)

Result:

In [120]: df.groupby('Trip').apply(f)
Out[120]:
Trip
Trip_1    1.000000
Trip_2    0.499825
dtype: float64
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • thanks for the answer but i get this error - TypeError: ufunc 'radians' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' –  May 15 '17 at 14:56
  • what dtypes has your DF? – MaxU - stand with Ukraine May 15 '17 at 14:57
  • i have it as object –  May 15 '17 at 14:58
  • can you convert them to numerical dtypes? – MaxU - stand with Ukraine May 15 '17 at 14:59
  • I tried pd.to_numeric. i get error again :/ ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). And when i tried values.astype(float), i get the error - TypeError: 'DataFrame' object is not callable :( –  May 15 '17 at 15:07
  • TypeError: 'DataFrame' object is not callable :( –  May 15 '17 at 15:33
  • @Iris, could you provide a __reproducible__ sample data set? – MaxU - stand with Ukraine May 15 '17 at 15:53
  • also one more error. sorry .def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371): ^ SyntaxError: invalid syntax –  May 15 '17 at 18:38
  • def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371): ^ SyntaxError: invalid syntax –  May 15 '17 at 18:44
  • @Iris, I can't reproduce it. Try to copy just the function definition and paste it in iPython/Jupyter – MaxU - stand with Ukraine May 15 '17 at 18:45
  • i use pycharm . will that be a problem ? –  May 15 '17 at 18:47
  • thank you ! i have been working on this ! if i get this error : ufunc 'radians' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' do you have any clue what else to do ? –  May 15 '17 at 19:43
  • @Iris, as i told you before i would need a __reproducible__ data set in order to be able to help you... – MaxU - stand with Ukraine May 15 '17 at 19:46
  • thanks ! the data is too big. so i tried extracting only these necessary column and tried running the code.. no more syntax error. but just error with last line where Trip has been groupby. i am trying to fix it ! thanks for your help. the actual trip data is alphanumeric. hope that wont be a problem ? –  May 15 '17 at 21:16