0

If number of days, Each day has 24 hours, the question is how to extract those days according to their available corresponding hours values, So I have in each day more than 15 hours Values then I will consider that day, other wise I will ignore the day with less than 15 hours values

{ ts <- seq(as.POSIXct("2015-08-06 12:00"), as.POSIXct("2015-08-21 17:30"), by='60 min')

x1<-sample(c(NA,1,4,5),366,replace=TRUE) DF <- data.frame(ts, x1)

}

Jon
  • 13
  • 5
  • 1
    Do you have some sample data? – Chamkrai Jul 03 '22 at 12:04
  • ok I will attach a csv file – Jon Jul 03 '22 at 12:06
  • How can I upload the csv file here – Jon Jul 03 '22 at 12:50
  • 2
    If you have a dataframe loaded in your R environment, use `dput(your dataframe)` and post it as text in your post – Chamkrai Jul 03 '22 at 13:03
  • Or if that's too long, run `dput(head(YOUR_DATAFRAME, 10))` to make a code "recipe" you can paste into your question, which we can run to create an exact copy of the first 10 rows of `YOUR_DATAFRAME`. The answer will depend somewhat on the specific format of data you have. – Jon Spring Jul 03 '22 at 18:10

1 Answers1

1

You didn't mention what your data looks like, nor what to do when there are exactly 15 'hours values', so I made some assumptions:

example data

set.seed(3)
df <- data.frame( timestamp = sort(lubridate::as_datetime( sample(1656492684:1656892684, 100) )),
                  value = runif(100))
            timestamp      value
1 2022-06-29 10:16:59 0.55691665
2 2022-06-29 10:50:13 0.61934743
3 2022-06-29 13:56:17 0.93225700
4 2022-06-29 13:56:53 0.67114286
5 2022-06-29 14:24:20 0.05132358
 [ reached 'max' / getOption("max.print") -- omitted 95 rows ]

code

library('dplyr')
df %>%
  
  # group by date
  group_by( date = as.Date(timestamp) ) %>%
  
  # for each group, get all hours, count number of unique hours
  # (2x the same hour only counts as one), keep only groups with
  # 15 or more unique hours 
  filter( n_distinct(lubridate::hour(timestamp)) >= 15 ) %>%
  
  # remove intermediate column
  ungroup() %>% select(-date)

result

            timestamp      value
1 2022-06-30 00:07:23 0.92558114
2 2022-06-30 00:52:01 0.18972964
3 2022-06-30 01:25:31 0.35458337
4 2022-06-30 01:25:50 0.09570177
5 2022-06-30 01:37:28 0.07627256
 [ reached 'max' / getOption("max.print") -- omitted 51 rows ]

other example data

create example data frame with ~ 30% missing values

set.seed(3)
df <- data.frame( timestamp = seq(as.POSIXct('2022-01-01', tz='utc'),as.POSIXct('2022-01-10 23:00', tz='utc'), by = '1 hour') ,
                  value = runif(240))
df$value[runif(nrow(df)) < 0.3] <- NA
             timestamp     value
1  2022-01-01 00:00:00 0.3833159
2  2022-01-01 01:00:00        NA
3  2022-01-01 02:00:00        NA
4  2022-01-01 03:00:00 0.5453477
5  2022-01-01 04:00:00        NA
6  2022-01-01 05:00:00 0.3511720
7  2022-01-01 06:00:00 0.2766057
8  2022-01-01 07:00:00        NA
9  2022-01-01 08:00:00 0.3768846
10 2022-01-01 09:00:00 0.6506105
 [ reached 'max' / getOption("max.print") -- omitted 230 rows ]

code

library('dplyr')
df %>%
  
  # create `date` column and group by it
  group_by( date = as.Date(timestamp) ) %>%
  
  # create column with number of non-NA values per date
  mutate( non.na.values = sum(!is.na(value))) %>%
  
  # keep only rows with 15 or more non.na.values in date 
  filter( non.na.values >= 15 ) %>%
  
  # optional: ungroup and remove intermediate columns
  ungroup() %>%
  select(-date, -non.na.values)

result

             timestamp     value
1  2022-01-07 00:00:00 0.7469746
2  2022-01-07 01:00:00 0.4626171
3  2022-01-07 02:00:00        NA
4  2022-01-07 03:00:00 0.6663023
5  2022-01-07 04:00:00        NA
6  2022-01-07 05:00:00 0.9273060
7  2022-01-07 06:00:00        NA
8  2022-01-07 07:00:00 0.7554021
9  2022-01-07 08:00:00        NA
10 2022-01-07 09:00:00 0.1475389
 [ reached 'max' / getOption("max.print") -- omitted 14 rows ]
Caspar V.
  • 1,782
  • 1
  • 3
  • 16
  • I am so sorry for misunderstanding, I meant, that I have lets say each day 24 hours corresponding values, we have NA values in the column Value, then I have to do statistics, If I have values corresponding hours for each day "numbers, not NA" and those values more than 15 values and the others NA, then, I will consider that day, If that day has values corresponding to its hours les than 15 values "Number", then that day will be ignored – Jon Jul 04 '22 at 10:58
  • @Jon I've added another example that fits what you describe. For future reference, please take the time to read [how to ask a good question](https://stackoverflow.com/help/how-to-ask), and check out the answers to [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Caspar V. Jul 04 '22 at 12:04