1

I expect to find for thousand of ids the days when they start to be recorded, and the days when they stop, in a simple way.

I currently use a loop which works well but take ages, as below.

an example of my dataset :

id date
1  2017-11-30
1  2017-12-01
1  2017-12-02
1  2017-12-03
1  2017-12-05
1  2017-12-06
1  2017-12-07
1  2017-12-08
1  2017-12-09
1  2017-12-10

and then I use this loop to find each date when the individual start to be recorded, without a stop between days. In my example in give the '2017-11-30' and the '2017-12-05' for the starts, and the '2017-12-03' and the '2017-12-10' for the ends.

nani <- unique(dat$id)
n <- length(dat$id)
#SET THE NEW OBJECT WHERE TO SAVE RESULTS
NEWDAT <- NULL
for(i in 1 : n)
{
#SELECT ANIMALS I WITHIN THE DATA.FRAME
x <- which(dat$id == nani[i])

#FIND THE POSITION IN THE DATA FRAME OF THE DAYS WHEN THE RECORD IS NOT CONTINUE
diffx <- diff(diff(dat$date[x]))

#FIND THE POSITION OF STARTS FOR EACH SESSIONS OF RECORDS
starti <- which(diffx < 0) +1

#FIND THE POSITION OF ENDS FOR EACH SESSIONS OF RECORDS
endi <- which(diffx > 0) +1

#FIND THE DATES OF STARTS FOR EACH SESSIONS OF RECORDS
starts_records <- c(dat$date[x][1], dat$date[x][starti])

#FIND THE DATES OF ENDS FOR EACH SESSIONS OF RECORDS
ends_records <- c(dat$date[x][endi], dat$date[x][length(x)])

#CREATE LABELS
name_start <- rep("START_RECORDS_BY_SENSORS", length(starts_records))
name_end <- rep("END_RECORDS_BY_SENSORS", length(ends_records))

#CREATE THE NEW DATA.FRAME EXPECTED
dat2 <- data.frame( "event_start" = c(starts_records, ends_records), 
                    "name" = c(name_start, name_end))
dat2 <- dat2[order(dat2$event_start),]

#SAVE RESULTS
NEWDAT <- bind_rows(NEWDAT, dat2)
}

So far, I tried things as below but did not found the right solution to avoid the loop.

NEWDAT <- dat %>% group_by(id) %>% summarize(diff_days = diff(diff(date)))

I still struggle to understand well the syntaxe of dplyr.

VincentP
  • 89
  • 10

1 Answers1

3

You can try to create a new group at every break and get first and last date in each group.

library(dplyr)

df %>%
  group_by(id, grp = cumsum(c(TRUE, diff(date) > 1))) %>%
  summarise(start = first(date), stop = last(date))

#     id   grp start      stop      
#  <int> <int> <date>     <date>    
#1     1     1 2017-11-30 2017-12-03
#2     1     2 2017-12-05 2017-12-10
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213