0

I need to calculate distances between two data points ((lat1,lon1) and (lat2,lon2)).

enter image description here

I found a way how to do it here:

import geopy.distance

coords_1 = (52.2296756, 21.0122287)
coords_2 = (52.406374, 16.9251681)

print geopy.distance.vincenty(coords_1, coords_2).km

As a result I need to convert latitude and longitude to one column I found a way here, however, it takes to much time.

df["point1"] = df[["lon1", "lat1"]].apply(Point, axis=1)
df["point2"] = df[["lon2", "lat2"]].apply(Point, axis=1)


Is there a faster solution?

John Mayer
  • 103
  • 7

3 Answers3

1

Try using geopandas.points_from_xy():

import geopandas
df['points1'] = geopandas.points_from_xy(df.lon1, df.lat1)
df['points2'] = geopandas.points_from_xy(df.lon2, df.lat2)

If it is still too slow, install pygeos which will vectorize points_from_xy() and speed it up more.

tdy
  • 36,675
  • 19
  • 86
  • 83
  • Yes, it is working, but is it possible to make the new column be in a format `[41.141412, -8.618643] `rather `POINT (41.141412, -8.618643)` ? because after I need to save the values in a list using `df[df.city == 'ALL'].points1.values.tolist()` (I inserted values as if not I got an error 'GeometryArray' object has no attribute 'tolist') – John Mayer Apr 28 '21 at 11:42
1

If you want tuples of the form (x,y) you can do this:

Imagine your dataframe looks like this:

df = pd.read_csv(r"C:\users\k_sego\LatLong.csv", sep=";")
print(df)

        Lat        Lon
0   59.214735  18.062262
1   59.214735  18.062262
2   59.214735  18.062262
3   59.213542  18.063627
4   59.212553  18.064678
..        ...        ...
70  59.199559  18.046147
71  59.199559  18.046147
72  59.199559  18.046147
73  59.198898  18.051291
74  59.199044  18.055571

Then

df['new_col'] = list(zip(df.Lat, df.Lon))

produces this:

Lat        Lon                 new_col
0   59.214735  18.062262  (59.214735, 18.062262)
1   59.214735  18.062262  (59.214735, 18.062262)
2   59.214735  18.062262  (59.214735, 18.062262)
3   59.213542  18.063627  (59.213542, 18.063627)
4   59.212553  18.064678  (59.212553, 18.064678)
..        ...        ...                     ...
70  59.199559  18.046147  (59.199559, 18.046147)
71  59.199559  18.046147  (59.199559, 18.046147)
72  59.199559  18.046147  (59.199559, 18.046147)
73  59.198898  18.051291  (59.198898, 18.051291)
74  59.199044  18.055571  (59.199044, 18.055571)

  • Thank you for your answer. i am a bit new in python, I think I need in arrays `[` and `]` . I also tried my code with tuples and got error >'tuple' object has no attribute 'tolist' if after this `zip()` run a code `df[df.city == 'ALL'].points1.tolist()` – John Mayer Apr 28 '21 at 12:09
1

If you want 'point' as a tuple -

df['point1'] = list(zip(df['lat1'].values, df['lon1'].values))

If you want 'point' as a list -

df['point1'] = list(map(list,zip(df['lat1'].values, df['lon1'].values)))

Performance Comparison ->

%timeit geopandas.points_from_xy(df.D, df.B)
108 µs ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit list(map(list,zip(df['D'].values, df['B'].values)))
4.82 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

As you can see if you use zip/list/map it'll be a lot faster.

Nk03
  • 14,699
  • 2
  • 8
  • 22