0

I have consecutive GPS position data (latitude, longitude) sampling at every second for every day. This data is from multiple train trips going different direction using Paris-Lyon train track.

I need to filter out only the data among these multiple trips which are lying between two stations Paris Gare de Lyon and Lyon Part Dieu. For example, I just need to filter out all the data between two GPS coordinates of Paris (48.844601, 2.373777) to Lyon (45.760573, 4.860163) stations out of multiple trips over this track.

         timestamp         train_id     latitude    longitude   train_speed     
   2021-03-01 00:00:00      3086       48.843067    2.378110    20.18520    
   2021-03-01 00:00:00      2086       48.843067    2.378110    20.18520
   2021-03-01 00:00:00      7073       48.837433    2.388602    0.18360
          ---               ---          ---          ---         ---
   2021-03-01 23:59:59      1041       48.726383    2.542348    156.86281   
   2021-03-01 23:59:59      5006       46.829850    4.492440    182.00002   
   2021-03-01 23:59:59      2086       46.829850    4.492440    182.00002

I used below method, but it cannot return only the data between Paris-Lyon. Its returns me also Paris-Lyon-Marseilles, Paris-Lyon-Toulouse, etc as well.

paris_lat=  48.844601
paris_lon= 2.373777
lyon_lat=  45.760573
lyon_lon= 4.860163

# filtering lat,lon between Paris and Lyon which is not working  
df= gps_data[(gps_data['latitude'].between(48.844601, 45.760573)) & (gps_data['longitude'].between(2.373777,4.860163))]

Any help on this regard will be highly appreciated.

Here is the sample datasets:

timestamp         train_id  latitude    longitude
2021-03-01 06:18:30  10     48.826300   2.400840
2021-03-01 06:18:31  10     48.826324   2.400820
2021-03-01 06:18:32  10     48.826350   2.400800
2021-03-01 06:18:33  10     48.826378   2.400780
2021-03-01 06:18:34  10     48.826410   2.400758
2021-03-01 06:18:35  10     48.826440   2.400737
2021-03-01 06:18:36  10     48.826470   2.400717
2021-03-01 06:18:37  10     48.826508   2.400695
2021-03-01 07:43:17  10     48.826153   2.401872    
2021-03-01 07:43:18  10     48.825980   2.402124    
2021-03-01 07:43:19  10     48.825813   2.402382
2021-03-01 11:17:52  10     43.308040   5.388560    
2021-03-01 11:17:53  10     43.308056   5.388590    
2021-03-01 11:17:54  10     43.308067   5.388617

Output: Need to filter the datasets with all the [latitude,longitude] lies between GPS coordinates [48.844601,2.373777]and [45.760573, 4.860163]:

timestamp             train_id  latitude    longitude
2021-03-01 06:18:30   10        48.826300   2.400840
2021-03-01 06:18:31   10        48.826324   2.400820
2021-03-01 06:18:32   10        48.826350   2.400800
2021-03-01 06:18:33   10        48.826378   2.400780
2021-03-01 06:18:34   10        48.826410   2.400758
2021-03-01 06:18:35   10        48.826440   2.400737
2021-03-01 06:18:36   10        48.826470   2.400717
2021-03-01 06:18:37   10        48.826508   2.400695
2021-03-01 07:43:17   10        48.826153   2.401872    
2021-03-01 07:43:18   10        48.825980   2.402124    
2021-03-01 07:43:19   10        48.825813   2.402382
MALAM
  • 37
  • 7
  • I think that maybe you can get this info from other source, cause is not a trivial problem. what you are doing there is getting all data that fits in the rectangle between paris and lyon. Anyways if that's not posible you could 1) order your data by id (cause then you have the trays ordered by train) then I would search any train that gets the direction of movement (rate of change in lat over rate in long) or something like that. Good Luck – Ulises Bussi Oct 25 '21 at 15:35
  • Hello @Ulises Bussi, it will be highly appreciated if you could please help me out with some sample code for that...thanks! – MALAM Oct 25 '21 at 19:19
  • I'm not sure I understand why your idea of filtering by a bounding box wouldn't work. Could you post [a sample dataframe](https://stackoverflow.com/a/20159305/530160), along with your expected output? – Nick ODell Oct 25 '21 at 19:44
  • @ Nick ODell, question updated with sample data-frame. Could you please provide your feedback. – MALAM Oct 25 '21 at 21:08

2 Answers2

0

If I correctly understand your problem, you need the data from trains that only travel from Paris to Lyon. But along the data you desire you are also retrieving trains that use Paris-Lyon railway, but do not finish their journey at Lyon station. If this is the case, you can use an additional condition to filter by maximum and minimum gps coordinates, so you will remove trains that go beyond Lyon or Paris:

paris_lat=  48.844601
paris_lon= 2.373777
lyon_lat=  45.760573
lyon_lon= 4.860163

paris_lyon_trains = (
    gps_data.loc[
        (gps_data['latitude'].between(paris_lat, lyon_lat))
        & (gps_data['longitude'].between(paris_lon, lyon_lon))
    ]
)

only_paris_lyon_trains = (
    paris_lyon_trains.loc[
         (paris_lyon_trains["latitude"].max() == paris_lat)
         & (paris_lyon_trains["latitude"].min() == lyon_lat)
         & (paris_lyon_trains["longitude"].max() == lyon_lon)
         & (paris_lyon_trains["longitude"].min() == paris_lon)
    ]
)
BSP
  • 735
  • 1
  • 10
  • 27
  • Hello @BSP, it show me `KeyError: False` I think this is due to the comparison `==` with exact point `paris_lat, lyon_lat`. Could you please update the code. No, I only want the filtered data between Paris to Lyon, cut off the trip after that. – MALAM Oct 26 '21 at 16:50
  • 1
    @MALAM answer edited. – BSP Oct 26 '21 at 17:24
  • still have same error KeyError: False. Have you tied with some sample data at you end?? – MALAM Oct 26 '21 at 22:19
0

The task is to check if a geographical point belongs to an area on map aka geographical shape. Your approach uses simple rectangle shape and naive algorithm. But there exist many libraries for that purposes. For example, you can use S2 Geometry.

The first step to a more reliable solution is to define polygonal shapes, covering each train line precisely. Of course, to build such geo-shapes you will need more coordinates than a pair of start/end points. I assume that the list of train stations (-> their coordinates) should be available and that will be enough to start.

Next step will be to check if a point belong to a pre-built shape. With S2 Geometry implementing both steps is a piece of cake.

I.G.
  • 370
  • 1
  • 8
  • Hello @I.G. yes, its possible to create a `polygon.shp` file from a single Paris-Lyon trip. Then use this predefined polygon shape to compare with the multiple train trips geopanda data points(latitude,longitude) to filter out the data based on the points which lies inside the `polygon.shp` file. But this process is computationally expensive with lots of train trips and returns me all the trips between Paris-Lyon and Lyon-Paris as well. But I only need each individual Paris-Lyon tips. So any idea how can I separate that?? – MALAM Nov 07 '21 at 00:51
  • @MALAM, to separate there and back trips you can analyze coordinates of trip's first and last points. Checking distance to center coordinates of Paris and Lyon should be enough for that. – I.G. Nov 08 '21 at 07:49