First, sorry for this long introduction, but I think this will help understanding the problem better. I am working on a project where we are trying to make use of huge floating car data to infer human mobility patterns. I am using RStudio to do so. Basically we have two files; trips.csv that contains 375,000 trips with metadata such as trip ID, start/end location (longitude, latitude) and other fields. Second file is waypoints.csv which contains full GPS waypoint data, listed trip by trip. This includes waypoint sequence, location and other fields.
In total, there are nearly 10 million waypoints (second file) for these 375,000 trip (first file). So each trip from first file has several number of waypoints in second file that together form the trajectory of that trip. The following tables show samples from each file with only the columns that I need in my problem:
Trip Data
Tripld,Lon1,Lat1,Lon2,Lat2,distance,
bb983d,11.565,48.19,11.55,48.143,7498,
da5bgg,11.584,48.157,11.639,48.098,1364,
saefeg,11.591,48.142,11. 563,48.18,7377
Way Point Data
TripId,sequence,Lon,Lat,
bb983d,0,11.565,48.19,
bb983d,1,11.56688,48.18158,
bb983d,2,11.56351,48.18144,
bb983d,3,11.56335,48.1888,
bb983d,4,11.5654,48.17617,
da5bgg,0,11.584,48.157,
da5bgg,1,11.583417,48.155167,
da5bgg,2,11.578472,48.144556,
da5bgg,3,11.57075,48.142139,
5aefeg,0,11.591,48.142,
5aefeg,1,11.58994,4813956
5aefeg,2,11.58797,48.13706
Here is the code I used to make the data frames:
dput(droplevels(head(trips)))structure(list(TripId = structure(1:6, .Label = c("00a7da9f4b503f36fc937f386b11ca58", "00aa3cb70345798d9b1d92bc4685b3ee", "017cb0697a1135c5cd3479c1edc2aa6b", "01cc30aa0e036817cf4bdc468c9fad8a", "01f0a6a90ec964ae8014d2f750231663", "02949197deca3f1d52906cfc147454c5"), class = "factor"), StartLocLat = c(48.178, 48.098, 48.15, 48.176, 48.149, 48.151), startLocLon = c(11.573, 11.501, 11.503, 11.558, 11.503, 11.563), EndLocLat = (48.143, 48.098, 48.18, 48.168, 48.148, 48.127), EndLocLon = c(11.55, 11.639, 11.563, 11.526, 11.616, 11.554)), row.names = c(NA, 6L), class = "data.frame")
dput(droplevels(head(waypoints))) structure(list(TripId = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c ("00a7da9f4b503f36fc937f386b11ca58", "00aa3cb70345798d9b1d92bc4685b3ee"), class = "factor"), Sequence = c(0L, 1L, 2L, 3L, 4L, 0L), Latitude = c(48.178, 48.18158, 48.18144, 48.1808, 48.17617, 48.098), Longitude = c(11.573, 11.56688, 11.56351, 11.56335, 11.5654, 11.501)), row.names = c(NA, 6L), class = "data.frame")
Now, I would like to add a column deviation area that represents the area between a virtual straight line from start point to end point of each trip, and the actual path or trajectory resulted from connecting the way points (sequence) by line segments for that trip.
The attached photo may help understanding the respective area:
I did a quick research but didn't find what I exactly need especially that I need to do this for all trips.
Any hints/suggestions with codes -if possible- will be very very appreciated!