0

I have a database of hourly data organized in rows and would like to reshape it in such as way as to obtain the start and end times when the data are within a certain criteria

Consider the following case example, one column is the sequential hourly times, and in the second column is the dummy variable data.

Yrs=  data.frame(Date=seq(as.POSIXct("2019-02-04 01:00:00",tz="UTC"), as.POSIXct("2019-02-04 23:00:00",tz="UTC"), by="hour"))
Yrs$Var=c(1:12,1:11)

I would like to obtain the start and end dates of the period in which the Variable was between say 3 and 7.

Expected result:

StartDate               EndDate
2019-02-04 03:00:00     2019-02-04 07:00:00
2019-02-04 15:00:00     2019-02-04 19:00:00

I figure I can create a new column indicating the rows where the criteria is met, but do not know how to get the start and end of those consecutive periods

Yrs$Period= ifelse(Yrs$Var >= 3 & Yrs$Var <=7, 1, 0)

I found a reverse example to this problem here Given start date and end date, reshape/expand data for each day between (each day on a row) but I am struggling to figure this out. Any help will be greatly appreciated.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Camilo
  • 201
  • 2
  • 6

2 Answers2

0

Maybe something like:

library(data.table)
setDT(Yrs)[, .(StartDate=Date[Var==3L], EndDate=Date[Var==7L]), 
    by=.(c(0L, cumsum(diff(Var) < 1L)))][, -1L]

output:

             StartDate             EndDate
1: 2019-02-04 03:00:00 2019-02-04 07:00:00
2: 2019-02-04 15:00:00 2019-02-04 19:00:00
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
0

Why not filter and spread ?

library(dplyr)

Yrs %>%
  filter(Var == 3 | Var == 7) %>%
  group_by(Var) %>%
  mutate(ind = row_number()) %>%
  spread(Var, Date) %>%
  select(-ind) %>%
  rename_all(funs(c("Start_Date", "End_Date")))

#  Start_Date          End_Date           
#  <dttm>              <dttm>             
#1 2019-02-04 03:00:00 2019-02-04 07:00:00
#2 2019-02-04 15:00:00 2019-02-04 19:00:00
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213