Subset dataframe based on the longest distance between two consecutive observations

Question

I have the dataframe below and I would like to subset it in a way that it should find the observation when a name covered the longest distance between two consecutive observations. If there is a situation when a name moves exactly the same amount of meters at the same time to select the most recent.

So I would like to have as final result 2 rows. Those consequtives with the longest distance, And if there are more than one consequtive pairs only the most recent should remain. Then I will take those 2 points and I will display them on a map.

Here is my data:

name<-c("bal","bal","bal","bal","bal","bal","bal","bal")
LAT<-c(54.77127,54.76542,54.76007,54.75468,54.74926 ,54.74385,54.73847,54.73228)
LON<-c(18.99692,18.99361,18.99059   ,18.98753,18.98447,18.98150,18.97842,18.97505   )
dtime<-c("2016-12-16 02:37:02","2016-12-16 02:38:02","2016-12-16 02:38:32","2016-12-16 02:39:08",
         "2016-12-16 02:39:52","2016-12-16 02:41:02","2016-12-16 02:42:02","2016-12-16 02:42:32")
df<-data.frame(name,LAT,LON,dtime)

anf this is how I think I should calculate the distance

library(geosphere)
distm(c(as.numeric(df[1,3]),as.numeric(df[1,2])), c(as.numeric(df[2,3]),as.numeric(df[2,2])), fun = distHaversine)

and this regarding time difference:

difftime("2016-12-19 11:33:01", "2016-12-19 11:31:01", units = "secs")

but how can I combine them?

Edo · Accepted Answer · 2020-10-02T07:38:02.190

I think you can do everything with one pipeline in dplyr

library(dplyr)

df %>% 
 group_by(name) %>% 
 mutate(lag_LAT = lag(LAT), lag_LON = lag(LON)) %>% 
 mutate(distance = diag(distm(cbind(lag_LON, lag_LAT), cbind(LON, LAT), fun = distHaversine)),
        timespan = difftime(dtime, lag(dtime), units = "secs")) %>% 
 slice_max(distance) %>% 
 slice_max(dtime) %>% 
 ungroup()

#> # A tibble: 1 x 8
#>   name    LAT   LON dtime               lag_LAT lag_LON distance timespan
#>   <chr> <dbl> <dbl> <chr>                 <dbl>   <dbl>    <dbl> <drtn>  
#> 1 bal    54.7  19.0 2016-12-16 02:42:32    54.7    19.0     722. 30 secs

Given your request in the comment, I added the first mutate to keep track of the previous position, so that you're able to plot it later. Having everything in one unique row, it's much better than having two separated rows.

With the second mutate you can calculate the distance between two following points and the time difference. I did not question whether the calculation of the distance is correct. I assumed you knew better than I do.

The first slice_max identifies the max distance, while the second one it's necessary just in case of ties in the first one (you said you were looking for the most recent in case of ties).

I grouped because I figured you may have more than one name in your dataset.

I did not get why you need to calculate the time difference, but I left it.

if you need to see more decimal places in a tibble, use `options(pillar.sigfig = 7)` as suggest [here](https://stackoverflow.com/questions/55018308/controlling-decimal-places-displayed-in-a-tibble-understanding-what-pillar-sigf) — Edo, Oct 01 '20 at 13:52
Thank you sorry for misunderstanding I would like to have as final result 2 rows. Those consequtives with the longest distance, And if there are more than one consequtive pairs only the most recent should remain. Then I will take those 2 points and I will display them on a map — firmo23, Oct 02 '20 at 01:49
Look at my edit. I think it will be more confortable for you to have everything in one row. — Edo, Oct 02 '20 at 07:38

Subset dataframe based on the longest distance between two consecutive observations

1 Answers1