I have the data frame that have 4 columns two first have coordinations of first point x1, y1 and two second have coordinations of second one x1, y2. I need to calculate distance between these points and fill it into other column. To calculate distance I use geopy.distance.geodesic and I need to do it fast so I don't want to do it in cycle. Can I do something like that in pandas?
df['distance'] = df['x1', 'y1', 'x2', 'y2'].map(lambda x,y,z,w = geodesic((x, y), (z, w)))

- 3
- 1
-
Pass the relevant series directly to a Haversine function: https://stackoverflow.com/a/51722117/2741091. Otherwise apply your `geopy` function. – ifly6 May 31 '23 at 13:05
-
your line of code should work. Did you try it??? – gtomer May 31 '23 at 13:28
2 Answers
If you want to use multiple fields of a dataframe row, you need to use something like:
df['dx'] = df.apply(lambda row:dx_function(row),axis=1)
in your main program, where 'row' is being passed to function dx_function.
In the function dx_function, use:
x1 = row['x1'] y1 = row['y1']
The function would return the result.
If you wanted to modify just one field, use:
df['name'] = df['name'].map(lambda x: x.lower())
This would change the text in the column 'name' to lower case.
Hope this helps.

- 11
- 5
Yes. I understand that you want to calculate distance on each row values without looping. You can use apply() (look at documentation) using a lambda function as shown below:
df['distance'] = df.apply(lambda row: geodesic((row['x1'], row['y1']), (row['x2'], row['y2'])), axis=1)
This code applies lambda function to each pair of points on each row and stores the results into new column 'distance' on the same row.

- 685
- 3
- 13
- 36