Need help for this specific in time series datetimes

Question

If number of days, Each day has 24 hours, the question is how to extract those days according to their available corresponding hours values, So I have in each day more than 15 hours Values then I will consider that day, other wise I will ignore the day with less than 15 hours values

{ ts <- seq(as.POSIXct("2015-08-06 12:00"), as.POSIXct("2015-08-21 17:30"), by='60 min')

x1<-sample(c(NA,1,4,5),366,replace=TRUE) DF <- data.frame(ts, x1)

}

If you have a dataframe loaded in your R environment, use `dput(your dataframe)` and post it as text in your post — Chamkrai, Jul 03 '22 at 13:03
Or if that's too long, run `dput(head(YOUR_DATAFRAME, 10))` to make a code "recipe" you can paste into your question, which we can run to create an exact copy of the first 10 rows of `YOUR_DATAFRAME`. The answer will depend somewhat on the specific format of data you have. — Jon Spring, Jul 03 '22 at 18:10

Caspar V. · Answer 1 · 2022-07-04T12:03:20.497

You didn't mention what your data looks like, nor what to do when there are exactly 15 'hours values', so I made some assumptions:

example data

set.seed(3)
df <- data.frame( timestamp = sort(lubridate::as_datetime( sample(1656492684:1656892684, 100) )),
                  value = runif(100))

            timestamp      value
1 2022-06-29 10:16:59 0.55691665
2 2022-06-29 10:50:13 0.61934743
3 2022-06-29 13:56:17 0.93225700
4 2022-06-29 13:56:53 0.67114286
5 2022-06-29 14:24:20 0.05132358
 [ reached 'max' / getOption("max.print") -- omitted 95 rows ]

code

library('dplyr')
df %>%
  
  # group by date
  group_by( date = as.Date(timestamp) ) %>%
  
  # for each group, get all hours, count number of unique hours
  # (2x the same hour only counts as one), keep only groups with
  # 15 or more unique hours 
  filter( n_distinct(lubridate::hour(timestamp)) >= 15 ) %>%
  
  # remove intermediate column
  ungroup() %>% select(-date)

result

            timestamp      value
1 2022-06-30 00:07:23 0.92558114
2 2022-06-30 00:52:01 0.18972964
3 2022-06-30 01:25:31 0.35458337
4 2022-06-30 01:25:50 0.09570177
5 2022-06-30 01:37:28 0.07627256
 [ reached 'max' / getOption("max.print") -- omitted 51 rows ]

other example data

create example data frame with ~ 30% missing values

set.seed(3)
df <- data.frame( timestamp = seq(as.POSIXct('2022-01-01', tz='utc'),as.POSIXct('2022-01-10 23:00', tz='utc'), by = '1 hour') ,
                  value = runif(240))
df$value[runif(nrow(df)) < 0.3] <- NA

             timestamp     value
1  2022-01-01 00:00:00 0.3833159
2  2022-01-01 01:00:00        NA
3  2022-01-01 02:00:00        NA
4  2022-01-01 03:00:00 0.5453477
5  2022-01-01 04:00:00        NA
6  2022-01-01 05:00:00 0.3511720
7  2022-01-01 06:00:00 0.2766057
8  2022-01-01 07:00:00        NA
9  2022-01-01 08:00:00 0.3768846
10 2022-01-01 09:00:00 0.6506105
 [ reached 'max' / getOption("max.print") -- omitted 230 rows ]

code

library('dplyr')
df %>%
  
  # create `date` column and group by it
  group_by( date = as.Date(timestamp) ) %>%
  
  # create column with number of non-NA values per date
  mutate( non.na.values = sum(!is.na(value))) %>%
  
  # keep only rows with 15 or more non.na.values in date 
  filter( non.na.values >= 15 ) %>%
  
  # optional: ungroup and remove intermediate columns
  ungroup() %>%
  select(-date, -non.na.values)

result

             timestamp     value
1  2022-01-07 00:00:00 0.7469746
2  2022-01-07 01:00:00 0.4626171
3  2022-01-07 02:00:00        NA
4  2022-01-07 03:00:00 0.6663023
5  2022-01-07 04:00:00        NA
6  2022-01-07 05:00:00 0.9273060
7  2022-01-07 06:00:00        NA
8  2022-01-07 07:00:00 0.7554021
9  2022-01-07 08:00:00        NA
10 2022-01-07 09:00:00 0.1475389
 [ reached 'max' / getOption("max.print") -- omitted 14 rows ]

I am so sorry for misunderstanding, I meant, that I have lets say each day 24 hours corresponding values, we have NA values in the column Value, then I have to do statistics, If I have values corresponding hours for each day "numbers, not NA" and those values more than 15 values and the others NA, then, I will consider that day, If that day has values corresponding to its hours les than 15 values "Number", then that day will be ignored — Jon, Jul 04 '22 at 10:58
@Jon I've added another example that fits what you describe. For future reference, please take the time to read [how to ask a good question](https://stackoverflow.com/help/how-to-ask), and check out the answers to [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Caspar V., Jul 04 '22 at 12:04

Need help for this specific in time series datetimes

1 Answers1

example data

code

result

other example data

code

result