If you're calculating distance travelled,
then I think you need the distance between contiguous coordinates.
You can use the dist
function provided by the proxy
package,
which is a bit more flexible than the default one,
and combine it with dplyr
:
library(proxy)
library(dplyr)
df <- data.frame(Name = c(rep("John", 5L), rep("Steve", 5L), rep("Dave", 5L)),
x = sample(1:30, 15L),
y = sample(1:30, 15L))
group_fun <- function(sub_df) {
if (nrow(sub_df) == 1L)
return(data.frame(Name = sub_df$Name, total = 0))
x <- sub_df[-nrow(sub_df), c("x", "y")]
y <- sub_df[-1L, c("x", "y")]
total <- sum(proxy::dist(x, y, method = "Euclidean", pairwise = TRUE))
# return
data.frame(Name = sub_df$Name[1L], total = total)
}
out <- df %>%
group_by(Name) %>%
do(group_fun(.))
Inside group_fun
x
contains all coordinates except the last one,
and y
contains all coordinates except the first one
(per group in both cases),
so x[i,]
and y[i,]
contain contiguous coordinates for any i
.
Therefore, when we call proxy::dist
with pairwise = TRUE
,
we now get the distance between each pair (row-wise).
In the returned data frame we use sub_df$Name[1L]
because Name
was a grouping variable,
so it must be the same for all rows in sub_df
,
and we only want one of its values in the summary.
And if you want to be a bit more compact you can do it without dist
(i.e. only with dplyr
):
out <- df %>%
group_by(Name) %>%
summarise(total = sum(sqrt((x - lag(x))^2 + (y - lag(y))^2), na.rm = TRUE))