Classify a timestamp as occurring before or after a distance limit is reached in R

Question

I have a dataframe consisting of a series of timestamps with lat-lon point locations relating to animal GPS tracking data, grouped into separate trips made by each animal. For each timestamped lat-lon, I also have the distance of the point to the animals' home colony (in km).

I would like to classify each point with whether or not it occurred before or after the animal reached its maximum distance from its home colony.

The aim is to have a column in the dataframe stating where or not the timestamped lat-lon occurs during the outward section of the animals' trip (defined as all points before the animal reached maximum distance to its home colony) or the return section (all points that occurred after the animal reached its maximum distance from its home colony and before it returned to the colony).

Here is example data from 2 trips:

My desired output is as follows - the below table, with the addition of the 'Loc_Class' (location classification) column, where MAX = maximum distance from the colony, OUT = points falling before the animal reaches that MAX, and RET= points where the animal has reached the maximum distance away from the colony and is returning back to it.

Trip_ID	Timestamp	LON	LAT	Colony_lat	Colony_lon	Dist_to_Colony	Loc_Class
A	18/01/2022 14:00	-2.81698	-69.831474	-71.89	5.159	369.9948202	MAX
A	18/01/2022 14:30	-2.750411	-69.811873	-71.89	5.159	369.5644383	RET
A	18/01/2022 15:00	-2.736943	-69.811022	-71.89	5.159	369.2463158	RET
A	18/01/2022 15:30	-2.645026	-69.804136	-71.89	5.159	367.1665826	RET
A	18/01/2022 16:00	-2.56825	-69.833432	-71.89	5.159	362.7877481	RET
B	18/01/2022 21:30	-3.046828	-69.784849	-71.89	5.159	380.0350746	OUT
B	18/01/2022 22:00	-3.080154	-69.765688	-71.89	5.159	382.4142364	OUT
B	19/01/2022 00:30	-3.025742	-69.634483	-71.89	5.159	390.8078861	MAX
B	19/01/2022 01:00	-2.898522	-69.672147	-71.89	5.159	384.3511473	RET
B	19/01/2022 01:30	-2.907463	-69.769916	-71.89	5.159	377.173593	RET

library(tidyverse)
library(dplyr)
library(geosphere)

#load dataframe
df <- read.csv("Tracking_Data.csv")

#Great circle (geodesic) - add the great circle distance between the timestamped location and the animals' colony 
df_2 <- df %>% mutate(dist_to_colony = distGeo(cbind(LON, LAT), cbind(Colony_lon, Colony_lat)))

#change distance from colony from m to km 
df_2 <- df_2 %>% mutate(dist_to_colony = dist_to_colony/1000)

#find the point at which the maximum distance to colony occurs for each animals' trips
Max_dist_colony <- df_2 %>% group_by(TripID) %>% summarise(across(c(dist_to_colony), max))

#so now I need to classify each point using the 'Timestamp' and 'Dist_to_Colony' column and make a 'Loc_Class' column: 

#example df

| Trip_ID  | Timestamp        | LON      | LAT       |Colony_lat|Colony_lon|Dist_to_Colony|
| -------- | -----------------|----------------------|--------- |--------- |------------- |
|A     |18/01/2022 14:00  |-2.81698 |-69.831474  |  -71.89  |5.159     |369.9948202   |
|A     |18/01/2022 14:30  |-2.750411|-69.811873  |  -71.89  |5.159     |369.5644383   |
|A     |18/01/2022 15:00  |-2.736943|-69.811022  |  -71.89  |5.159     |369.2463158   |
|A     |18/01/2022 15:30  |-2.645026|-69.804136  |  -71.89  |5.159     |367.1665826   |
|A     |18/01/2022 16:00  |-2.56825 |-69.833432  |  -71.89  |5.159     |362.7877481   |
|B     |18/01/2022 21:30  |-3.046828|-69.784849  |  -71.89  |5.159     |380.0350746   |
|B     |18/01/2022 22:00  |-3.080154|-69.765688  |  -71.89  |5.159     |382.4142364   |
|B     |19/01/2022 00:30  |-3.025742|-69.634483  |  -71.89  |5.159     |390.8078861   |
|B     |19/01/2022 01:00  |-2.898522|-69.672147  |  -71.89  |5.159     |384.3511473   |
|B     |19/01/2022 01:30  |-2.907463|-69.769916  |  -71.89  |5.159     |377.173593    |

r2evans · Accepted Answer · 2022-11-19T18:48:47.137

Something like this?

comp3 <- function(vec, val, out = -1:1) ifelse(abs(vec - val) < 1e-9, out[2], ifelse(vec < val, out[1], out[3]))
quux %>%
  group_by(Trip_ID) %>%
  mutate(Direction = comp3(row_number(), which.max(Dist_to_Colony), c("OUT", "MAX", "RET"))) %>%
  ungroup()
# # A tibble: 10 x 9
#    Trip_ID Timestamp          LON   LAT Colony_lat Colony_lon Dist_to_Colony Loc_Class Direction
#    <chr>   <chr>            <dbl> <dbl>      <dbl>      <dbl>          <dbl> <chr>     <chr>    
#  1 A       18/01/2022 14:00 -2.82 -69.8      -71.9       5.16           370. MAX       MAX      
#  2 A       18/01/2022 14:30 -2.75 -69.8      -71.9       5.16           370. RET       RET      
#  3 A       18/01/2022 15:00 -2.74 -69.8      -71.9       5.16           369. RET       RET      
#  4 A       18/01/2022 15:30 -2.65 -69.8      -71.9       5.16           367. RET       RET      
#  5 A       18/01/2022 16:00 -2.57 -69.8      -71.9       5.16           363. RET       RET      
#  6 B       18/01/2022 21:30 -3.05 -69.8      -71.9       5.16           380. OUT       OUT      
#  7 B       18/01/2022 22:00 -3.08 -69.8      -71.9       5.16           382. OUT       OUT      
#  8 B       19/01/2022 00:30 -3.03 -69.6      -71.9       5.16           391. MAX       MAX      
#  9 B       19/01/2022 01:00 -2.90 -69.7      -71.9       5.16           384. RET       RET      
# 10 B       19/01/2022 01:30 -2.91 -69.8      -71.9       5.16           377. RET       RET

The comp3 function is really just a ternary-result comparison function: instead of something like +(vec > val) that returns just 0 (false) and 1 (true), this gives a third result when they are equal. For example,

comp3(1:5, 4)
# [1] -1 -1 -1  0  1

The extension to that is the out= argument that allows the user to specify what the three values should be instead of -1:1. (If you want to shorten the dplyr code, feel free to hard-code the default value of out= to be your string vector.

Another note: the use of abs(vec - val) < 1e-9 is another step towards generalizing it: if given floating-point (numeric) values, we might be subject to problems with strict floating-point equality for numbers of high precision (c.f., Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754). In this case it's a little overkill, but it will not return a different value. (And since you talk of a table with 4000 or so locations, the "overhead" of doing this one extra step will likely not be human-apparent.)

Thank you, but this line doesn't factor in that there are 2 trips which will have different maximum points. (The column I added noting which stage the points are in (MAX, RET and OUT) are the correct classifications - however I've checked these manually for an example dataframe, and as my actual dataset is 4000 + locations and 40+ trips long, this won't be plausible.) — Ellie_Petrels, Nov 19 '22 at 18:04

Classify a timestamp as occurring before or after a distance limit is reached in R

1 Answers1