0

I have a dataframe of hourly averaged values taken from 6 different sensors across a period of about a year, with a group of each of the 6 sensors located in 5 different sites. (Example - site_id arc1045 has sensors 0a, 0b, 0c, 0d, 0e, 0f and site_id arc1046 has sensors 0a, 0b, 0c, 0d, 0e, 0f etc.)

   site_id sensor_id datetime            hourly_avg
   <chr>   <chr>     <dttm>                   <dbl>
 1 arc1045 0a        2019-11-15 09:00:00       3.67
 2 arc1045 0a        2019-11-15 10:00:00       4.68
 3 arc1045 0a        2019-11-15 11:00:00       5.63
 4 arc1045 0a        2019-11-15 12:00:00       5.8 
 5 arc1045 0a        2019-11-15 13:00:00       6.32
 6 arc1045 0a        2019-11-15 14:00:00       5.28
 7 arc1045 0a        2019-11-15 15:00:00       6.52
 8 arc1045 0a        2019-11-15 16:00:00       5.72
 9 arc1045 0a        2019-11-15 17:00:00       8.43
10 arc1045 0a        2019-11-15 18:00:00       6.62

However, certain hourly averaged values are missing. I want to figure out these missing readings and append these rows with NA values, by checking hourly intervals of each sensor_id and site_id from start date (2019-11-15 09:00:00) to end date (2020-08-25 15:00:11)

I can do this by looping through the dataframe in hourly intervals, but is there an easier way to handle this using an R package?

z star
  • 684
  • 6
  • 19

1 Answers1

1

You can use complete from tidyr to fill in the missing hours.

library(dplyr)
library(tidyr)

df %>%
  group_by(site_id, sensor_id) %>%
  complete(datetime = seq(min(datetime), max(datetime), by = 'hour'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213