1

I have a continuous day with boat movements, that include several trips. I want to identify each trip with a different code (unique ID). Each boat trip can be detected because the period between successive points is larger. Note that the time is not regular.

For example:

library(dplyr)
rep_data <- data.frame(
  t = c(1, 2, 3, 4,5,10, 12, 13,14,15,16, 23, 24,26,28),#this would be the time
  expect_output = c(1, 1, 1, 1,1,2, 2, 2,2,2,2, 3, 3,3,3)) # this would be the unique ID of the trip
rep_data <- rep_data %>% 
  mutate(dif.time = c(t-lag(t,1)),
         gp = ifelse(dif.time > 5, 1, 0))

I tried:

I tried with cumsum HERE

rep_data %>%
  mutate(daynum = cumsum(!duplicated(gp)))

I tried with group_indices another one

rep_data %>%
  group_by(dif.time) %>% 
  group_indices() 

and also tried cur_group_id.

But I am not even close to solve this simple challenge.

The column expect_output indicates the result I wanted, that would be three boat trips during the complete period.

Any ideia how to get there? Any help will be greatly apretiated,

Thank you very much in advance,

Best regards, Marta

Martocas
  • 13
  • 2

1 Answers1

0

Based on your data you need to come up with some threshold number which will identify new trip. You can then take difference between consecutive values and increment the sequence whenever threshold is crossed.

threshold <- 5
rep_data$trip_id <- cumsum(c(TRUE, diff(rep_data$t) >= threshold))
rep_data

#    t trip_id
#1   1       1
#2   2       1
#3   3       1
#4   4       1
#5   5       1
#6  10       2
#7  12       2
#8  13       2
#9  14       2
#10 15       2
#11 16       2
#12 23       3
#13 24       3
#14 26       3
#15 28       3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213