I have a set of animal locations with different sampling intervals. What I want to do is group and label the sequences where the sampling interval matches a certain criteria (e.g. is below a certain value). This is a revision of this question which was marked as a duplicate of this one. The difference in this revised question is the fact that all values that do NOT match the criteria should be ignored, not labeled.
Let me illustrate with some dummy data:
start <- Sys.time()
timediff <- c(rep(5,3),rep(20,3),rep(5,2))
timediff <- cumsum(timediff)
# Set up a dataframe with a couple of time values
df <- data.frame(TimeDate = start + timediff)
# For understanding purposes, I will note the time differences in a separate column
df$TimeDiff <- c(diff(df$TimeDate),NA)
Using the @Josh O'Brien's answer, one could define a function that groups values which meet a specific criteria.
number.groups <- function(input){
input[is.na(input)] <- FALSE # to eliminate NA
return(head(cumsum(c(TRUE,!input)),-1))
}
# Define the criteria and apply the function
df$Group <- number.groups(df$TimeDiff <= 5)
# output
TimeDate TimeDiff Group
1 2016-03-16 15:41:51 5 1
2 2016-03-16 15:41:56 5 1
3 2016-03-16 15:42:01 20 1
4 2016-03-16 15:42:21 20 2
5 2016-03-16 15:42:41 20 3
6 2016-03-16 15:43:01 5 4
7 2016-03-16 15:43:06 5 4
8 2016-03-16 15:43:11 NA 4
The issue here is that rows 4 and 5 are labeled as individual groups, rather than ignored. Is there a way to make sure that values that DO NOT belong to a group are NOT grouped (e.g. stay NA)?