Start and end dates of time periods defined by a column in a data frame

Question

I have a database of hourly data organized in rows and would like to reshape it in such as way as to obtain the start and end times when the data are within a certain criteria

Consider the following case example, one column is the sequential hourly times, and in the second column is the dummy variable data.

Yrs=  data.frame(Date=seq(as.POSIXct("2019-02-04 01:00:00",tz="UTC"), as.POSIXct("2019-02-04 23:00:00",tz="UTC"), by="hour"))
Yrs$Var=c(1:12,1:11)

I would like to obtain the start and end dates of the period in which the Variable was between say 3 and 7.

Expected result:

StartDate               EndDate
2019-02-04 03:00:00     2019-02-04 07:00:00
2019-02-04 15:00:00     2019-02-04 19:00:00

I figure I can create a new column indicating the rows where the criteria is met, but do not know how to get the start and end of those consecutive periods

Yrs$Period= ifelse(Yrs$Var >= 3 & Yrs$Var <=7, 1, 0)

I found a reverse example to this problem here Given start date and end date, reshape/expand data for each day between (each day on a row) but I am struggling to figure this out. Any help will be greatly appreciated.

score 0 · Accepted Answer · answered Feb 07 '19 at 00:19

0

Maybe something like:

library(data.table)
setDT(Yrs)[, .(StartDate=Date[Var==3L], EndDate=Date[Var==7L]), 
    by=.(c(0L, cumsum(diff(Var) < 1L)))][, -1L]

output:

             StartDate             EndDate
1: 2019-02-04 03:00:00 2019-02-04 07:00:00
2: 2019-02-04 15:00:00 2019-02-04 19:00:00

answered Feb 07 '19 at 00:19

chinsoon12

25,005
4
25
35

Nice! Thank you. – Camilo Feb 07 '19 at 07:06

score 0 · Answer 2 · answered Feb 07 '19 at 00:40

Why not filter and spread ?

library(dplyr)

Yrs %>%
  filter(Var == 3 | Var == 7) %>%
  group_by(Var) %>%
  mutate(ind = row_number()) %>%
  spread(Var, Date) %>%
  select(-ind) %>%
  rename_all(funs(c("Start_Date", "End_Date")))

#  Start_Date          End_Date           
#  <dttm>              <dttm>             
#1 2019-02-04 03:00:00 2019-02-04 07:00:00
#2 2019-02-04 15:00:00 2019-02-04 19:00:00

Start and end dates of time periods defined by a column in a data frame

2 Answers2

Linked