I'm trying to measure the Euclidean distance between successive points. However, I want to group it by ITEM
. It works fine when I computing the distance without considering a groupby function but am getting an error when using .progress_apply
.
import pandas as pd
import numpy as np
from tqdm import tqdm
df = pd.DataFrame({"ITEM":['A', 'A', 'A', 'B', 'B'],
"LAT":[-20, -21, -20, -20, -20],
"LON":[150, 151, 150, 148, 149]
})
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
# not grouping by ITEM
#df['distance'] = haversine(df.LAT.shift(), df.LON.shift(), df.loc[1:, 'LAT'], df.loc[1:, 'LON'])
# grouping by ITEM
df['distance'] = df.groupby(['ITEM']).progress_apply(haversine(df.LAT.shift(), df.LON.shift(), df.loc[1:, 'LAT'], df.loc[1:, 'LON']))
Out:
TypeError: 'Series' object is not callable
Intended Output:
ITEM LAT LON distance
0 A -20 150 NaN
1 A -21 151 147.32
2 A -20 150 147.32
3 B -20 148 0.00
4 B -20 149 111.13