ID st_lat st_lng end_lat end_lng
0 4 127.035740 37.493954 127.035740 37.493954
1 4 127.035740 37.493954 127.035740 37.493954
2 5 127.034870 37.485865 127.034318 37.485645
3 5 127.034201 37.485598 127.035064 37.485949
4 5 127.035064 37.485949 127.034618 37.485938
my dataframe looks like above. I am trying to create new column by applying haversine function which require two tuples. ex: haversine( (lat, lng), (lat, lng) ) returns distance between two points.
Their datatypes are in float. following https://www.geeksforgeeks.org/create-a-new-column-in-pandas-dataframe-based-on-the-existing-columns/ I've done
df["distance(km)"] = df.apply(lambda row:haversine((row.st_lat, row.st_lng), (row.end_lat, row.end_lng)))
which returns
AttributeError: ("'Series' object has no attribute 'st_lat'", 'occurred at index user_id')
and
df["distance(km)"] = haversine((df.st_lat, df.st_lng), (df.end_lat, df.end_lng))
returning TypeError: cannot convert the series to float.
I know it is because df.st_lat gives series and cannot input two series and create a tuple.
for each st_lat, st_lng pair I want to compare it with end_lat, end_lng pair and create a column that contain distances.
Any help? I've looked at how to split column of tuples in pandas dataframe?
Split Column containing 2 values into different column in pandas df
which is opposite of what I am trying to do.
EDIT: solved by using
def dist(df):
return haversine(df["start"], df["end"])
df["distance(km)"] = df.apply(dist, axis =1)