subset data by time interval if I have all data between time interval

Question

I have a data frame that looks like this:

      X id mat.1      mat.2       mat.3               times
1     1  1  Anne 1495206060  18.5639404 2017-05-19 11:01:00
2     2  1  Anne 1495209660   9.0160321 2017-05-19 12:01:00
3     3  1  Anne 1495211460  37.6559161 2017-05-19 12:31:00
4     4  1  Anne 1495213260  31.1218856 2017-05-19 13:01:00
....
164 164  1  Anne 1497825060   4.8098351 2017-06-18 18:31:00
165 165  1  Anne 1497826860  15.0678781 2017-06-18 19:01:00
166 166  1  Anne 1497828660   4.7636241 2017-06-18 19:31:00

What I would like is to subset the data set by time interval (all data between 11 AM and 4 PM) if there are data points for each hour at least (11 AM, 12, 1, 2, 3, 4 PM) within each day. I want to ultimately sum the values from mat.3 per time interval (11 AM to 4 PM) per day.

I did tried:

sub.1 <- subset(t,format(times,'%H')>='11' & format(times,'%H')<='16')

but this returns all the data from any of the times between 11 AM and 4 PM, but often I would only have data for e.g. 12 and 1 PM for a given day.

I only want the subset from days where I have data for each hour from 11 AM to 4 PM. Any ideas what I can try?

Possible duplicate of [Subset xts object by time of day](https://stackoverflow.com/questions/13912282/subset-xts-object-by-time-of-day) — Eric Fail, Feb 07 '18 at 17:05

score 0 · Answer 1 · answered Feb 07 '18 at 17:24

Try to create a new variable in your data frame with only the hour.

your_data$hour<-format(your_data$times, format="%H:%M:%S")

Then, using this new variable try to do the next:

#auxiliar variable with your interval of time
your_data$aux_var<-ifelse(your_data$hour >"11:00:00" || your_data$hour<"16:00:00" ,1,0)

So, the next step is filter your data when aux_var==1

your_data[which(your_data$aux_var ==1),]

score 0 · Accepted Answer · answered Feb 08 '18 at 07:15

A complement to @Henry Navarro answer for solving an additional problem mentioned in the question.

If I understand in proper way, another concern of the question is to find the dates such that there are data points at least for each hour of the given interval within the day. A possible way following the style of @Henry Navarro solution is as follows:

library(lubridate)
your_data$hour_only <- as.numeric(format(your_data$times, format = "%H"))
your_data$days <- ymd(format(your_data$times, "%Y-%m-%d"))
your_data_by_days_list <- split(x = your_data, f = your_data$days)

# the interval is narrowed for demonstration purposes
hours_intervals <- 11:13
all_hours_flags <- data.frame(days = unique(your_data$days), 
    all_hours_present = sapply(function(Z) (sum(unique(Z$hour_only) %in% hours_intervals) >= 
    length(hours_intervals)), X = your_data_by_days_list), row.names = NULL)
your_data <- merge(your_data, all_hours_flags, by = "days")

There is now the column "all_hours_present" indicating that the data for a corresponding day contains at least one value for each hour in the given hours_intervals. And you may use this column to subset your data

subset(your_data, all_hours_present)

Thank you! This was very helpful! – Anni Feb 08 '18 at 14:18 — Anni, Feb 08 '18 at 14:18

subset data by time interval if I have all data between time interval

2 Answers2