0

I'm trying to measure the Euclidean distance between successive points. However, I want to group it by ITEM. It works fine when I computing the distance without considering a groupby function but am getting an error when using .progress_apply.

import pandas as pd
import numpy as np
from tqdm import tqdm

df = pd.DataFrame({"ITEM":['A', 'A', 'A', 'B', 'B'], 
               "LAT":[-20, -21, -20, -20, -20], 
               "LON":[150, 151, 150, 148, 149]
               })


def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the distance between two points
    on the earth (specified in decimal degrees)

    All args must be of equal length.    

    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))

    km = 6367 * c

    return km

# not grouping by ITEM
#df['distance'] = haversine(df.LAT.shift(), df.LON.shift(), df.loc[1:, 'LAT'], df.loc[1:, 'LON'])

# grouping by ITEM
df['distance'] = df.groupby(['ITEM']).progress_apply(haversine(df.LAT.shift(), df.LON.shift(), df.loc[1:, 'LAT'], df.loc[1:, 'LON']))

Out:

TypeError: 'Series' object is not callable

Intended Output:

  ITEM  LAT  LON  distance
0    A  -20  150       NaN
1    A  -21  151    147.32
2    A  -20  150    147.32
3    B  -20  148      0.00
4    B  -20  149    111.13
Chopin
  • 96
  • 1
  • 10
  • 35

0 Answers0