0

I have a dataframe as shown below. Which is inspector tracking data (latitude and longitude) almost in each minute.

Inspector_ID   Timestamp               Latitude      Longitude
1              2018-07-24 7:31:00      100.491491    13.725239
1              2018-07-24 7:31:01      101.491491    15.725239
1              2018-07-24 7:32:04      104.491491    14.725239
1              2018-07-24 7:33:06      102.491491    10.725239
2              2018-07-24 8:35:08      105.491491    8.725239
2              2018-07-24 8:36:10      101.491491    15.725239
2              2018-07-24 8:37:09      101.491491    12.725239
2              2018-07-24 8:39:00      106.491491    16.725239

From the above data I would like find out the distance travelled by an inspector from each consecutive tracking data (latitude and longitude of the inspector).

Expected output (example)

Inspector_ID      Timestamp               Latitude      Longitude    Distance
    1              2018-07-24 7:31:00      100.491491    13.725239    nan
    1              2018-07-24 7:31:01      101.491491    15.725239    2.3
    1              2018-07-24 7:32:04      104.491491    14.725239    1.2
    1              2018-07-24 7:33:06      102.491491    10.725239    3.6
    2              2018-07-24 8:35:08      105.491491    8.725239     nan
    2              2018-07-24 8:36:10      101.491491    15.725239    5.6
    2              2018-07-24 8:37:09      101.491491    12.725239    2.1
    2              2018-07-24 8:39:00      106.491491    16.725239    3

Here I would like to calculate the distance groupby Inspector_ID.

Note: The number populated in Distance column are not the correct distance.

I am not aware, how to calculate distance using latitude and longitude. I am very new in pandas as well.

Danish
  • 2,719
  • 17
  • 32
  • `geopandas` can calculate distance for you - why do you say it gives incorrect answer? BTW, Inspector 2 is a true superman, between first two points he travelled with speed ~7.75 km / s, he's likely on International Space Station :). – Lukasz Tracewski Nov 17 '19 at 07:36
  • 1
    Does this answer your question? [Calculate distance from GPS data \[longitude and latitude\]](https://stackoverflow.com/questions/29625708/calculate-distance-from-gps-data-longitude-and-latitude) – gosuto Nov 17 '19 at 07:36
  • @LukaszTracewski it is just an example output – Danish Nov 17 '19 at 07:42
  • @jorijnsmit I would like to calculate the distance travelled by each inspector, The one you suggested is for only one inspector. – Danish Nov 17 '19 at 07:49
  • Actually in that question they show how to calculate distance per row and then how to group it, which is exactly what you are looking for. – gosuto Nov 17 '19 at 07:50
  • @jorijnsmit distance per row only if the inspector is same, otherwise nan – Danish Nov 17 '19 at 07:53
  • @LukaszTracewski Can you please share your code here? – Danish Nov 17 '19 at 09:00
  • please check my answer @ALI – ansev Nov 17 '19 at 11:04

1 Answers1

1

Use GroupBy.diff:

df['distance']=df.groupby('Inspector_ID')[['Latitude','Longitude']].diff().pow(2).sum(axis=1,min_count=1).pow(1/2)
print(df)

   Inspector_ID           Timestamp    Latitude  Longitude  distance
0             1 2018-07-24 07:31:00  100.491491  13.725239       NaN
1             1 2018-07-24 07:31:01  101.491491  15.725239  2.236068
2             1 2018-07-24 07:32:04  104.491491  14.725239  3.162278
3             1 2018-07-24 07:33:06  102.491491  10.725239  4.472136
4             2 2018-07-24 08:35:08  105.491491   8.725239       NaN
5             2 2018-07-24 08:36:10  101.491491  15.725239  8.062258
6             2 2018-07-24 08:37:09  101.491491  12.725239  3.000000
7             2 2018-07-24 08:39:00  106.491491  16.725239  6.403124

if you want 0 instead of NaN remove min_count = 1

  • What is done here is to calculate the module of each vector constructed from consecutive points of each inspector
ansev
  • 30,322
  • 5
  • 17
  • 31
  • That distance is in KM? – Danish Nov 17 '19 at 11:10
  • I am not expert in this. Here the latitude and longitude are from gis data. If then is that okay to use your above code? – Danish Nov 17 '19 at 11:14
  • 1
    My previous code works perfectly, you just have to take into account the necessary unit conversion, that is df ['km distance'] = n * df ['distance'] where n is the scale factor – ansev Nov 17 '19 at 11:16
  • without any scale factor, please let me know the unit of distance? – Danish Nov 17 '19 at 11:21
  • 1
    The unit of distance is exactly the same in which latitude and longitude are found. I can't know, because I don't know where you got the data and I think this is not a problem of pandas or programming. Try searching your data source. `Latitude, Longitude and distance` are in the same unit of measure – ansev Nov 17 '19 at 11:24
  • Note that measuring distance in angular units (lat / lon units) almost never makes sense. You cannot convert those to km, and there is no "scale factor" for that, because distance when traveling 1’ latitude is different from 1' longitude, unless you are sharp at the equator. I would not use this formula if you want distance in length units (meters, feet, etc). – Michael Entin Nov 18 '19 at 01:35