4

My question is basically a follow-up on the question previously raised here by user Wet Feet:

This is the modified dataset:

      date_time loc_id node  energy   kgco2 
1 2009-02-27 00:11:08     87  103 0.00000 0.00000 
2 2009-02-27 01:05:05     87  103 7.00000 3.75900 
3 2009-02-27 02:05:05     87  103 6.40039 3.43701 
4 2009-02-28 02:10:05     87  103 4.79883 2.57697 
5 2009-02-28 04:05:05     87  103 4.10156 2.20254 
6 2009-02-28 05:05:05     87  103 2.59961 1.39599
7 2009-03-01 03:20:05     87  103 2.59961 1.39599

I am trying to get only those rows which fall within a specific time interval, e.g. 02:05:00 to 03:30:00.

3 2009-02-27 02:05:05     87  103 6.40039 3.43701 
4 2009-02-28 02:10:05     87  103 4.79883 2.57697
7 2009-03-01 03:20:05     87  103 2.59961 1.39599

Applying the solution in the linked question (hour from the lubridate package), however, doesn't suffice since I also have to consider the minutes of my interval. I could use the interval function from the lubridate package to include minutes, but since my dataframe covers different dates, it wouldn't help.

I am particularly curious whether there is any solution allowing to make use of dplyr 's filter verb. Or is using the xts package the only way forward?

eipi10
  • 91,525
  • 24
  • 209
  • 285
zoowalk
  • 2,018
  • 20
  • 33
  • I think you're going to have to use `minutes` with some `or`'s and `and`'s: `filter((hour(date_time) == 2 & minutes(date_time) >= 5) | (hour(date_time) == 3 & minutes(date_time) <= 30))` – Hugh Aug 21 '17 at 22:45
  • 2
    Relevant, possibly a duplicate - https://stackoverflow.com/questions/44251861/filter-all-days-between-a-time-range-in-r/44251952 – thelatemail Aug 21 '17 at 22:53

1 Answers1

0

In the code below, for each row I calculate the number of seconds since midnight and check whether that value is within the time range in your question, also converted to seconds since midnight. I've included the code to set up the data with a datetime format (and UTC time zone) since the data sample wasn't provided in reproducible form.

1. Set up the data frame

library(lubridate)
library(tidyverse)

dat = read.table(text="date_time time loc_id node  energy   kgco2 
                 1 2009-02-27 00:11:08     87  103 0.00000 0.00000 
                 2 2009-02-27 01:05:05     87  103 7.00000 3.75900 
                 3 2009-02-27 02:05:05     87  103 6.40039 3.43701 
                 4 2009-02-28 02:10:05     87  103 4.79883 2.57697 
                 5 2009-02-28 04:05:05     87  103 4.10156 2.20254 
                 6 2009-02-28 05:05:05     87  103 2.59961 1.39599
                 7 2009-03-01 03:20:05     87  103 2.59961 1.39599", 
                 header=TRUE, stringsAsFactors=FALSE)

dat$date_time = as.POSIXct(paste0(dat$date_time, dat$time), tz="UTC")
dat = dat %>% select(-time)

2. Helper function to convert hms time strings to seconds since midnight

hms_to_numeric = function(x) {
  x = as.POSIXct(paste("2010-01-01", x))
  3600 * hour(x) + 60 * minute(x) + second(x)
}

3. Filter the data to include only rows within the time range

dat %>% 
  filter(between(as.numeric(date_time) - as.numeric(as.POSIXct(substr(date_time,1,10), tz="UTC")), 
                 hms_to_numeric("02:05:00"), 
                 hms_to_numeric("03:30:00")))
            date_time loc_id node  energy   kgco2
1 2009-02-27 02:05:05     87  103 6.40039 3.43701
2 2009-02-28 02:10:05     87  103 4.79883 2.57697
3 2009-03-01 03:20:05     87  103 2.59961 1.39599
eipi10
  • 91,525
  • 24
  • 209
  • 285